Repository: amusi/CVPR2026-Papers-with-Code
Branch: main
Commit: 5709455e269a
Files: 9
Total size: 316.0 KB

Directory structure:
gitextract_f2cckni0/

├── CVPR2019-Papers-with-Code.md
├── CVPR2020-Papers-with-Code.md
├── CVPR2021-Papers-with-Code.md
├── CVPR2022-Papers-with-Code.md
├── CVPR2023-Papers-with-Code.md
├── CVPR2024-Papers-with-Code.md
├── CVPR2025-Papers-with-Code.md
├── README.md
└── master

================================================
FILE CONTENTS
================================================

================================================
FILE: CVPR2019-Papers-with-Code.md
================================================
# CVPR2019-Code

CVPR 2019 论文开源项目合集

传送门：[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)

附：[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv)

- [目标检测](#Object-Detection)
- [目标跟踪](#Object-Tracking)
- [语义分割](#Semantic-Segmentation)
- [实例分割](#Instance-Segmentation)
- [GAN](#GAN)
- [人脸检测](#Face-Detection)
- [人体姿态估计](#Human-Pose-Estimation)
- [6DoF 姿态估计](#6DoF-Pose-Estimation)
- [头部姿态估计](#Head-Pose-Estimation)
- [人群密度估计](#Crowd-Counting)

**更新记录：**

- 20200226：添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)

- 20191026：添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv)
- 20190405：添加 8 篇论文（目标检测、语义分割等方向）
- 20190408：添加 6 篇论文（目标跟踪、GAN、6DoF姿态估计等方向）

<a name="Object-Detection"></a>

# 目标检测

**Bounding Box Regression with Uncertainty for Accurate Object Detection**

- arXiv：<https://arxiv.org/abs/1809.08545>

- github：<https://github.com/yihui-he/KL-Loss>

<a name="Object-Tracking"></a>

# 目标跟踪

**Fast Online Object Tracking and Segmentation: A Unifying Approach**

- arXiv：<https://arxiv.org/abs/1812.05050>

- github：<https://github.com/foolwood/SiamMask>

- homepage：<http://www.robots.ox.ac.uk/~qwang/SiamMask>

**Unsupervised Deep Tracking**

- arXiv：<https://arxiv.org/abs/1904.01828>

- github：<https://github.com/594422814/UDT>

- github(PyTorch)：<https://github.com/594422814/UDT_pytorch>

**Target-Aware Deep Tracking**

- arXiv：<https://arxiv.org/abs/1904.01772>

- homepage：<https://xinli-zn.github.io/TADT-project-page/>

<a name="Semantic-Segmentation"></a>

# 语义分割

**Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation**

- arXiv：<https://arxiv.org/abs/1903.02120>

- github：[https://github.com/LinZhuoChen/DUpsampling（非官方）](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89)

**Dual Attention Network for Scene Segmentation**

- arXiv：<https://arxiv.org/abs/1809.02983>

- github：<https://github.com/junfu1115/DANet>

**Collaborative Global-Local Networks for Memory-Efﬁcient Segmentation of Ultra-High Resolution Images**

- arXiv：None

- github：<https://github.com/chenwydj/ultra_high_resolution_segmentation>

<a name="Instance-Segmentation"></a>

# 实例分割

**Mask Scoring R-CNN**

- arXiv：<https://arxiv.org/abs/1903.00241>

- github：<https://github.com/zjhuang22/maskscoring_rcnn>

<a name="GAN"></a>

# GAN

**Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis**

- arXiv：<https://arxiv.org/abs/1903.05628>
- github：<https://github.com/HelenMao/MSGAN>

<a name="Face-Detection"></a>

# 人脸检测

**DSFD: Dual Shot Face Detector**

- arXiv：<https://arxiv.org/abs/1810.10220>

- github：<https://github.com/TencentYoutuResearch/FaceDetection-DSFD>

<a name="Human-Pose-Estimation"></a>

# 人体姿态估计

**Deep High-Resolution Representation Learning for Human Pose Estimation**

- arXiv：<https://arxiv.org/abs/1902.09212>

- github：<https://github.com/leoxiaobin/deep-high-resolution-net.pytorch>

<a name="6DoF-Pose-Estimation"></a>

# 6DoF姿态估计

**PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**

- arXiv：<https://arxiv.org/abs/1812.11788>
- github：<https://github.com/zju3dv/pvnet>

<a name="Head-Pose-Estimation"></a>

# 头部姿态估计

**PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**

- paper：<https://github.com/shamangary/FSA-Net/blob/master/0191.pdf>
- github：<https://github.com/shamangary/FSA-Net>

<a name="Crowd-Counting"></a>

# 人群密度估计

**Learning from Synthetic Data for Crowd Counting in the Wild**

- arXiv：<https://arxiv.org/abs/1903.03303>
- github：<https://github.com/gjy3035/GCC-SFCN>
- homepage：<https://gjy3035.github.io/GCC-CL/>

================================================
FILE: CVPR2020-Papers-with-Code.md
================================================
# CVPR2020-Code

[CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集，同时欢迎各位大佬提交issue，分享CVPR 2020开源项目

**【推荐阅读】**

- [CVPR 2020 virtual](http://cvpr20.com/)
- ECCV 2020 论文开源项目合集来了：https://github.com/amusi/ECCV2020-Code

- 关于往年CV顶会论文（如ECCV 2020、CVPR 2019、ICCV 2019）以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

**【CVPR 2020 论文开源目录】**

- [CNN](#CNN)
- [图像分类](#Image-Classification)
- [视频分类](#Video-Classification)
- [目标检测](#Object-Detection)
- [3D目标检测](#3D-Object-Detection)
- [视频目标检测](#Video-Object-Detection)
- [目标跟踪](#Object-Tracking)
- [语义分割](#Semantic-Segmentation)
- [实例分割](#Instance-Segmentation)
- [全景分割](#Panoptic-Segmentation)
- [视频目标分割](#VOS)
- [超像素分割](#Superpixel)
- [交互式图像分割](#IIS)
- [NAS](#NAS)
- [GAN](#GAN)
- [Re-ID](#Re-ID)
- [3D点云（分类/分割/配准/跟踪等）](#3D-PointCloud)
- [人脸（识别/检测/重建等）](#Face)
- [人体姿态估计(2D/3D)](#Human-Pose-Estimation)
- [人体解析](#Human-Parsing)
- [场景文本检测](#Scene-Text-Detection)
- [场景文本识别](#Scene-Text-Recognition)
- [特征(点)检测和描述](#Feature)
- [超分辨率](#Super-Resolution)
- [模型压缩/剪枝](#Model-Compression)
- [视频理解/行为识别](#Action-Recognition)
- [人群计数](#Crowd-Counting)
- [深度估计](#Depth-Estimation)
- [6D目标姿态估计](#6DOF)
- [手势估计](#Hand-Pose)
- [显著性检测](#Saliency)
- [去噪](#Denoising)
- [去雨](#Deraining)
- [去模糊](#Deblurring)
- [去雾](#Dehazing)
- [特征点检测与描述](#Feature)
- [视觉问答(VQA)](#VQA)
- [视频问答(VideoQA)](#VideoQA)
- [视觉语言导航](#VLN)
- [视频压缩](#Video-Compression)
- [视频插帧](#Video-Frame-Interpolation)
- [风格迁移](#Style-Transfer)
- [车道线检测](#Lane-Detection)
- ["人-物"交互(HOI)检测](#HOI)
- [轨迹预测](#TP)
- [运动预测](#Motion-Predication)
- [光流估计](#OF)
- [图像检索](#IR)
- [虚拟试衣](#Virtual-Try-On)
- [HDR](#HDR)
- [对抗样本](#AE)
- [三维重建](#3D-Reconstructing)
- [深度补全](#DC)
- [语义场景补全](#SSC)
- [图像/视频描述](#Captioning)
- [线框解析](#WP)
- [数据集](#Datasets)
- [其他](#Others)
- [不确定中没中](#Not-Sure)

<a name="CNN"></a>

# CNN

**Exploring Self-attention for Image Recognition**

- 论文：https://hszhao.github.io/papers/cvpr20_san.pdf

- 代码：https://github.com/hszhao/SAN

**Improving Convolutional Networks with Self-Calibrated Convolutions**

- 主页：https://mmcheng.net/scconv/

- 论文：http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
- 代码：https://github.com/backseason/SCNet

**Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets**

- 论文：https://arxiv.org/abs/2003.13549
- 代码：https://github.com/zeiss-microscopy/BSConv

<a name="Image-Classification"></a>

# 图像分类

**Interpretable and Accurate Fine-grained Recognition via Region Grouping**

- 论文：https://arxiv.org/abs/2005.10411

- 代码：https://github.com/zxhuang1698/interpretability-by-parts

**Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion**

- 论文：https://arxiv.org/abs/2003.04490

- 代码：https://github.com/AdamKortylewski/CompositionalNets

**Spatially Attentive Output Layer for Image Classification**

- 论文：https://arxiv.org/abs/2004.07570 
- 代码（好像被原作者删除了）：https://github.com/ildoonet/spatially-attentive-output-layer 

<a name="Video-Classification"></a>

# 视频分类

**SmallBigNet: Integrating Core and Contextual Views for Video Classification**

- 论文：https://arxiv.org/abs/2006.14582
- 代码：https://github.com/xhl-video/SmallBigNet

<a name="Object-Detection"></a>

# 目标检测

**Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf
- 代码：https://github.com/FishYuLi/BalancedGroupSoftmax

**AugFPN: Improving Multi-scale Feature Learning for Object Detection**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf 
- 代码：https://github.com/Gus-Guo/AugFPN

**Noise-Aware Fully Webly Supervised Object Detection**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html
- 代码：https://github.com/shenyunhang/NA-fWebSOD/

**Learning a Unified Sample Weighting Network for Object Detection**

- 论文：https://arxiv.org/abs/2006.06568
- 代码：https://github.com/caiqi/sample-weighting-network

**D2Det: Towards High Quality Object Detection and Instance Segmentation**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf

- 代码：https://github.com/JialeCao001/D2Det

**Dynamic Refinement Network for Oriented and Densely Packed Object Detection**

- 论文下载链接：https://arxiv.org/abs/2005.09973

- 代码和数据集：https://github.com/Anymake/DRN_CVPR2020

**Scale-Equalizing Pyramid Convolution for Object Detection**

论文：https://arxiv.org/abs/2005.03101

代码：https://github.com/jshilong/SEPC

**Revisiting the Sibling Head in Object Detector**

- 论文：https://arxiv.org/abs/2003.07540

- 代码：https://github.com/Sense-X/TSD 

**Scale-equalizing Pyramid Convolution for Object Detection**

- 论文：暂无
- 代码：https://github.com/jshilong/SEPC 

**Detection in Crowded Scenes: One Proposal, Multiple Predictions**

- 论文：https://arxiv.org/abs/2003.09163
- 代码：https://github.com/megvii-model/CrowdDetection

**Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection**

- 论文：https://arxiv.org/abs/2004.04725
- 代码：https://github.com/NVlabs/wetectron

**Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection**

- 论文：https://arxiv.org/abs/1912.02424 
- 代码：https://github.com/sfzhang15/ATSS

**BiDet: An Efficient Binarized Object Detector**

- 论文：https://arxiv.org/abs/2003.03961 
- 代码：https://github.com/ZiweiWangTHU/BiDet

**Harmonizing Transferability and Discriminability for Adapting Object Detectors**

- 论文：https://arxiv.org/abs/2003.06297
- 代码：https://github.com/chaoqichen/HTCN

**CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection**

- 论文：https://arxiv.org/abs/2003.09119
- 代码：https://github.com/KiveeDong/CentripetalNet

**Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection**

- 论文：https://arxiv.org/abs/2003.11818
- 代码：https://github.com/ggjy/HitDet.pytorch

**EfficientDet: Scalable and Efficient Object Detection**

- 论文：https://arxiv.org/abs/1911.09070
- 代码：https://github.com/google/automl/tree/master/efficientdet 

<a name="3D-Object-Detection"></a>

# 3D目标检测

**SESS: Self-Ensembling Semi-Supervised 3D Object Detection**

- 论文： https://arxiv.org/abs/1912.11803

- 代码：https://github.com/Na-Z/sess

**Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection**

- 论文： https://arxiv.org/abs/2006.04356

- 代码：https://github.com/dleam/Associate-3Ddet

**What You See is What You Get: Exploiting Visibility for 3D Object Detection**

- 主页：https://www.cs.cmu.edu/~peiyunh/wysiwyg/

- 论文：https://arxiv.org/abs/1912.04986
- 代码：https://github.com/peiyunh/wysiwyg

**Learning Depth-Guided Convolutions for Monocular 3D Object Detection**

- 论文：https://arxiv.org/abs/1912.04799
- 代码：https://github.com/dingmyu/D4LCN

**Structure Aware Single-stage 3D Object Detection from Point Cloud**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html

- 代码：https://github.com/skyhehe123/SA-SSD

**IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf

- 代码：https://github.com/swords123/IDA-3D

**Train in Germany, Test in The USA: Making 3D Object Detectors Generalize**

- 论文：https://arxiv.org/abs/2005.08139

- 代码：https://github.com/cxy1997/3D_adapt_auto_driving

**MLCVNet: Multi-Level Context VoteNet for 3D Object Detection**

- 论文：https://arxiv.org/abs/2004.05679
- 代码：https://github.com/NUAAXQ/MLCVNet

**3DSSD: Point-based 3D Single Stage Object Detector**

- CVPR 2020 Oral

- 论文：https://arxiv.org/abs/2002.10187

- 代码：https://github.com/tomztyang/3DSSD

**Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation**

- 论文：https://arxiv.org/abs/2004.03572

- 代码：https://github.com/zju3dv/disprcn

**End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection**

- 论文：https://arxiv.org/abs/2004.03080

- 代码：https://github.com/mileyan/pseudo-LiDAR_e2e

**DSGN: Deep Stereo Geometry Network for 3D Object Detection**

- 论文：https://arxiv.org/abs/2001.03398
- 代码：https://github.com/chenyilun95/DSGN

**LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention**

- 论文：https://arxiv.org/abs/2004.01389
- 代码：https://github.com/yinjunbo/3DVID

**PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection**

- 论文：https://arxiv.org/abs/1912.13192

- 代码：https://github.com/sshaoshuai/PV-RCNN

**Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud**

- 论文：https://arxiv.org/abs/2003.01251 
- 代码：https://github.com/WeijingShi/Point-GNN 

<a name="Video-Object-Detection"></a>

# 视频目标检测

**Memory Enhanced Global-Local Aggregation for Video Object Detection**

论文：https://arxiv.org/abs/2003.12063

代码：https://github.com/Scalsol/mega.pytorch

<a name="Object-Tracking"></a>

# 目标跟踪

**SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking**

- 论文：https://arxiv.org/abs/1911.07241
- 代码：https://github.com/ohhhyeahhh/SiamCAR

**D3S -- A Discriminative Single Shot Segmentation Tracker**

- 论文：https://arxiv.org/abs/1911.08862
- 代码：https://github.com/alanlukezic/d3s

**ROAM: Recurrently Optimizing Tracking Model**

- 论文：https://arxiv.org/abs/1907.12006

- 代码：https://github.com/skyoung/ROAM

**Siam R-CNN: Visual Tracking by Re-Detection**

- 主页：https://www.vision.rwth-aachen.de/page/siamrcnn
- 论文：https://arxiv.org/abs/1911.12836
- 论文2：https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf
- 代码：https://github.com/VisualComputingInstitute/SiamR-CNN

**Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises**

- 论文：https://arxiv.org/abs/2003.09595 
- 代码：https://github.com/MasterBin-IIAU/CSA 

**High-Performance Long-Term Tracking with Meta-Updater**

- 论文：https://arxiv.org/abs/2004.00305

- 代码：https://github.com/Daikenan/LTMU

**AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization**

- 论文：https://arxiv.org/abs/2003.12949

- 代码：https://github.com/vision4robotics/AutoTrack

**Probabilistic Regression for Visual Tracking**

- 论文：https://arxiv.org/abs/2003.12565
- 代码：https://github.com/visionml/pytracking

**MAST: A Memory-Augmented Self-supervised Tracker**

- 论文：https://arxiv.org/abs/2002.07793
- 代码：https://github.com/zlai0/MAST

**Siamese Box Adaptive Network for Visual Tracking**

- 论文：https://arxiv.org/abs/2003.06761
- 代码：https://github.com/hqucv/siamban

## 多目标跟踪

**3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**

- 主页：https://vap.aau.dk/3d-zef/
- 论文：https://arxiv.org/abs/2006.08466
- 代码：https://bitbucket.org/aauvap/3d-zef/src/master/
- 数据集：https://motchallenge.net/data/3D-ZeF20

<a name="Semantic-Segmentation"></a>

# 语义分割

**FDA: Fourier Domain Adaptation for Semantic Segmentation**

- 论文：https://arxiv.org/abs/2004.05498

- 代码：https://github.com/YanchaoYang/FDA

**Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation**

- 论文：暂无

- 代码：https://github.com/JianqiangWan/Super-BPD

**Single-Stage Semantic Segmentation from Image Labels**

- 论文：https://arxiv.org/abs/2005.08104

- 代码：https://github.com/visinf/1-stage-wseg

**Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation**

- 论文：https://arxiv.org/abs/2003.00867
- 代码：https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation

**MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**

- 论文：http://vladlen.info/papers/MSeg.pdf
- 代码：https://github.com/mseg-dataset/mseg-api

**CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement**

- 论文：https://arxiv.org/abs/2005.02551
- 代码：https://github.com/hkchengrex/CascadePSP

**Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision**

- Oral
- 论文：https://arxiv.org/abs/2004.07703
- 代码：https://github.com/feipan664/IntraDA

**Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation**

- 论文：https://arxiv.org/abs/2004.04581
- 代码：https://github.com/YudeWang/SEAM

**Temporally Distributed Networks for Fast Video Segmentation**

- 论文：https://arxiv.org/abs/2004.01800

- 代码：https://github.com/feinanshan/TDNet

**Context Prior for Scene Segmentation**

- 论文：https://arxiv.org/abs/2004.01547

- 代码：https://git.io/ContextPrior

**Strip Pooling: Rethinking Spatial Pooling for Scene Parsing**

- 论文：https://arxiv.org/abs/2003.13328

- 代码：https://github.com/Andrew-Qibin/SPNet

**Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks**

- 论文：https://arxiv.org/abs/2003.05128
- 代码：https://github.com/shachoi/HANet

**Learning Dynamic Routing for Semantic Segmentation**

- 论文：https://arxiv.org/abs/2003.10401

- 代码：https://github.com/yanwei-li/DynamicRouting

<a name="Instance-Segmentation"></a>

# 实例分割

**D2Det: Towards High Quality Object Detection and Instance Segmentation**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf

- 代码：https://github.com/JialeCao001/D2Det

**PolarMask: Single Shot Instance Segmentation with Polar Representation**

- 论文：https://arxiv.org/abs/1909.13226 
- 代码：https://github.com/xieenze/PolarMask 
- 解读：https://zhuanlan.zhihu.com/p/84890413 

**CenterMask : Real-Time Anchor-Free Instance Segmentation**

- 论文：https://arxiv.org/abs/1911.06667 
- 代码：https://github.com/youngwanLEE/CenterMask 

**BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation**

- 论文：https://arxiv.org/abs/2001.00309
- 代码：https://github.com/aim-uofa/AdelaiDet

**Deep Snake for Real-Time Instance Segmentation**

- 论文：https://arxiv.org/abs/2001.01629
- 代码：https://github.com/zju3dv/snake

**Mask Encoding for Single Shot Instance Segmentation**

- 论文：https://arxiv.org/abs/2003.11712

- 代码：https://github.com/aim-uofa/AdelaiDet

<a name="Panoptic-Segmentation"></a>

# 全景分割

**Video Panoptic Segmentation**

- 论文：https://arxiv.org/abs/2006.11339
- 代码：https://github.com/mcahny/vps
- 数据集：https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0

**Pixel Consensus Voting for Panoptic Segmentation**

- 论文：https://arxiv.org/abs/2004.01849
- 代码：还未公布

**BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation**

论文：https://arxiv.org/abs/2003.14031

代码：https://github.com/Mooonside/BANet

<a name="VOS"></a>

# 视频目标分割

**A Transductive Approach for Video Object Segmentation**

- 论文：https://arxiv.org/abs/2004.07193

- 代码：https://github.com/microsoft/transductive-vos.pytorch

**State-Aware Tracker for Real-Time Video Object Segmentation**

- 论文：https://arxiv.org/abs/2003.00482

- 代码：https://github.com/MegviiDetection/video_analyst

**Learning Fast and Robust Target Models for Video Object Segmentation**

- 论文：https://arxiv.org/abs/2003.00908 
- 代码：https://github.com/andr345/frtm-vos

**Learning Video Object Segmentation from Unlabeled Videos**

- 论文：https://arxiv.org/abs/2003.05020
- 代码：https://github.com/carrierlxk/MuG

<a name="Superpixel"></a>

# 超像素分割

**Superpixel Segmentation with Fully Convolutional Networks**

- 论文：https://arxiv.org/abs/2003.12929
- 代码：https://github.com/fuy34/superpixel_fcn

<a name="IIS"></a>

# 交互式图像分割

**Interactive Object Segmentation with Inside-Outside Guidance**

- 论文下载链接：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
- 代码：https://github.com/shiyinzhang/Inside-Outside-Guidance
- 数据集：https://github.com/shiyinzhang/Pixel-ImageNet

<a name="NAS"></a>

# NAS

**AOWS: Adaptive and optimal network width search with latency constraints**

- 论文：https://arxiv.org/abs/2005.10481
- 代码：https://github.com/bermanmaxim/AOWS

**Densely Connected Search Space for More Flexible Neural Architecture Search**

- 论文：https://arxiv.org/abs/1906.09607

- 代码：https://github.com/JaminFong/DenseNAS

**MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning**

- 论文：https://arxiv.org/abs/2003.14058

- 代码：https://github.com/bhpfelix/MTLNAS

**FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions**

- 论文下载链接：https://arxiv.org/abs/2004.05565

- 代码：https://github.com/facebookresearch/mobile-vision

**Neural Architecture Search for Lightweight Non-Local Networks**

- 论文：https://arxiv.org/abs/2004.01961
- 代码：https://github.com/LiYingwei/AutoNL

**Rethinking Performance Estimation in Neural Architecture Search**

- 论文：https://arxiv.org/abs/2005.09917
- 代码：https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS
- 解读1：https://www.zhihu.com/question/372070853/answer/1035234510
- 解读2：https://zhuanlan.zhihu.com/p/111167409

**CARS: Continuous Evolution for Efficient Neural Architecture Search**

- 论文：https://arxiv.org/abs/1909.04977 
- 代码（即将开源）：https://github.com/huawei-noah/CARS 

<a name="GAN"></a>

# GAN

**SEAN: Image Synthesis with Semantic Region-Adaptive Normalization**

- 论文：https://arxiv.org/abs/1911.12861
- 代码：https://github.com/ZPdesu/SEAN

**Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation**

- 论文地址：http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html
- 代码地址：https://github.com/alpc91/NICE-GAN-pytorch 

**Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning**

- 论文：https://arxiv.org/abs/1912.01899
- 代码：https://github.com/SsGood/DBGAN 

**PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer**

- 论文：https://arxiv.org/abs/1909.06956
- 代码：https://github.com/wtjiang98/PSGAN

**Semantically Mutil-modal Image Synthesis**

- 主页：http://seanseattle.github.io/SMIS
- 论文：https://arxiv.org/abs/2003.12697
- 代码：https://github.com/Seanseattle/SMIS

**Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping**

- 论文：https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf
- 代码：https://github.com/yiranran/Unpaired-Portrait-Drawing

**Learning to Cartoonize Using White-box Cartoon Representations**

- 论文：https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf

- 主页：https://systemerrorwang.github.io/White-box-Cartoonization/
- 代码：https://github.com/SystemErrorWang/White-box-Cartoonization
- 解读：https://zhuanlan.zhihu.com/p/117422157
- Demo视频：https://www.bilibili.com/video/av56708333

**GAN Compression: Efficient Architectures for Interactive Conditional GANs**

- 论文：https://arxiv.org/abs/2003.08936

- 代码：https://github.com/mit-han-lab/gan-compression

**Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions**

- 论文：https://arxiv.org/abs/2003.01826 
- 代码：https://github.com/cc-hpc-itwm/UpConv 

<a name="Re-ID"></a>

# Re-ID

 **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html
- 代码：https://github.com/wangguanan/HOReID 

**COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**

- 论文：https://arxiv.org/abs/2005.07862

- 数据集：暂无

**Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking**

- 论文：https://arxiv.org/abs/2004.04199

- 代码：https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking

**Pose-guided Visible Part Matching for Occluded Person ReID**

- 论文：https://arxiv.org/abs/2004.00230
- 代码：https://github.com/hh23333/PVPM

**Weakly supervised discriminative feature learning with state information for person identification**

- 论文：https://arxiv.org/abs/2002.11939 
- 代码：https://github.com/KovenYu/state-information 

<a name="3D-PointCloud"></a>

# 3D点云（分类/分割/配准等）

## 3D点云卷积

**PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling**

- 论文：https://arxiv.org/abs/2003.00492
- 代码：https://github.com/yanx27/PointASNL 

**Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds**

- 论文下载链接：https://arxiv.org/abs/2003.12971

- 代码：https://github.com/raoyongming/PointGLR

**Grid-GCN for Fast and Scalable Point Cloud Learning**

- 论文：https://arxiv.org/abs/1912.02984

- 代码：https://github.com/Xharlie/Grid-GCN

**FPConv: Learning Local Flattening for Point Convolution**

- 论文：https://arxiv.org/abs/2002.10701
- 代码：https://github.com/lyqun/FPConv

## 3D点云分类

**PointAugment: an Auto-Augmentation Framework for Point Cloud Classification**

- 论文：https://arxiv.org/abs/2002.10876 
- 代码（即将开源）： https://github.com/liruihui/PointAugment/ 

## 3D点云语义分割

**RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds**

- 论文：https://arxiv.org/abs/1911.11236
- 代码：https://github.com/QingyongHu/RandLA-Net

- 解读：https://zhuanlan.zhihu.com/p/105433460

**Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels**

- 论文：https://arxiv.org/abs/2004.04091

- 代码：https://github.com/alex-xun-xu/WeakSupPointCloudSeg

**PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation**

- 论文：https://arxiv.org/abs/2003.14032
- 代码：https://github.com/edwardzhou130/PolarSeg

**Learning to Segment 3D Point Clouds in 2D Image Space**

- 论文：https://arxiv.org/abs/2003.05593

- 代码：https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space

## 3D点云实例分割

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

- 论文：https://arxiv.org/abs/2004.01658
- 代码：https://github.com/Jia-Research-Lab/PointGroup

## 3D点云配准

**Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences**

- 论文：https://arxiv.org/abs/2005.01014
- 代码：https://github.com/XiaoshuiHuang/fmr 

**D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features**

- 论文：https://arxiv.org/abs/2003.03164
- 代码：https://github.com/XuyangBai/D3Feat

**RPM-Net: Robust Point Matching using Learned Features**

- 论文：https://arxiv.org/abs/2003.13479
- 代码：https://github.com/yewzijian/RPMNet 

## 3D点云补全

**Cascaded Refinement Network for Point Cloud Completion**

- 论文：https://arxiv.org/abs/2004.03327
- 代码：https://github.com/xiaogangw/cascaded-point-completion

## 3D点云目标跟踪

**P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds**

- 论文：https://arxiv.org/abs/2005.13888
- 代码：https://github.com/HaozheQi/P2B

## 其他

**An Efficient PointLSTM for Point Clouds Based Gesture Recognition**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
- 代码：https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch

<a name="Face"></a>

# 人脸

## 人脸识别

**CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition**

- 论文：https://arxiv.org/abs/2004.00288

- 代码：https://github.com/HuangYG123/CurricularFace

**Learning Meta Face Recognition in Unseen Domains**

- 论文：https://arxiv.org/abs/2003.07733
- 代码：https://github.com/cleardusk/MFR
- 解读：https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ 

## 人脸检测

## 人脸活体检测

**Searching Central Difference Convolutional Networks for Face Anti-Spoofing**

- 论文：https://arxiv.org/abs/2003.04092

- 代码：https://github.com/ZitongYu/CDCN

## 人脸表情识别

**Suppressing Uncertainties for Large-Scale Facial Expression Recognition**

- 论文：https://arxiv.org/abs/2002.10392 

- 代码（即将开源）：https://github.com/kaiwang960112/Self-Cure-Network 

## 人脸转正

**Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images**

- 论文：https://arxiv.org/abs/2003.08124
- 代码：https://github.com/Hangz-nju-cuhk/Rotate-and-Render

## 人脸3D重建

**AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**

- 论文：https://arxiv.org/abs/2003.13845
- 数据集：https://github.com/lattas/AvatarMe

**FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**

- 论文：https://arxiv.org/abs/2003.13989
- 代码：https://github.com/zhuhao-nju/facescape

<a name="Human-Pose-Estimation"></a>

# 人体姿态估计(2D/3D)

## 2D人体姿态估计

**TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting**

- 主页：https://yzhq97.github.io/transmomo/

- 论文：https://arxiv.org/abs/2003.14401
- 代码：https://github.com/yzhq97/transmomo.pytorch

**HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation**

- 论文：https://arxiv.org/abs/1908.10357
- 代码：https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation

**The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation**

- 论文：https://arxiv.org/abs/1911.07524 
- 代码：https://github.com/HuangJunJie2017/UDP-Pose
- 解读：https://zhuanlan.zhihu.com/p/92525039

**Distribution-Aware Coordinate Representation for Human Pose Estimation**

- 主页：https://ilovepose.github.io/coco/ 

- 论文：https://arxiv.org/abs/1910.06278 

- 代码：https://github.com/ilovepose/DarkPose 

## 3D人体姿态估计

 **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data**

- 论文：https://arxiv.org/abs/2006.07778
- 代码：https://github.com/Nicholasli1995/EvoSkeleton 

**Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach**

- 主页：https://www.zhe-zhang.com/cvpr2020
- 论文：https://arxiv.org/abs/2003.11163

- 代码：https://github.com/CHUNYUWANG/imu-human-pose-pytorch

**Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**

- 论文下载链接：https://arxiv.org/abs/2004.01166

- 代码：https://github.com/Healthcare-Robotics/bodies-at-rest
- 数据集：https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML

**Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis**

- 主页：http://val.cds.iisc.ac.in/pgp-human/
- 论文：https://arxiv.org/abs/2004.04400

**Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation**

- 论文：https://arxiv.org/abs/2004.00329
- 代码：https://github.com/fabbrimatteo/LoCO

**VIBE: Video Inference for Human Body Pose and Shape Estimation**

- 论文：https://arxiv.org/abs/1912.05656 
- 代码：https://github.com/mkocabas/VIBE

**Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation**

- 论文：https://arxiv.org/abs/2002.11251 
- 代码：https://github.com/vnmr/JointVideoPose3D

**Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**

- 论文：https://arxiv.org/abs/2003.03972
- 数据集：暂无

<a name="Human-Parsing"></a>

# 人体解析

**Correlating Edge, Pose with Parsing**

- 论文：https://arxiv.org/abs/2005.01431

- 代码：https://github.com/ziwei-zh/CorrPM

<a name="Scene-Text-Detection"></a>

# 场景文本检测

**STEFANN: Scene Text Editor using Font Adaptive Neural Network**

- 主页：https://prasunroy.github.io/stefann/

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
- 代码：https://github.com/prasunroy/stefann
- 数据集：https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k

**ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf
- 代码：https://github.com/wangyuxin87/ContourNet 

**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**

- 论文：https://arxiv.org/abs/2003.10608
- 代码和数据集：https://github.com/Jyouhou/UnrealText/

**ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**

- 论文：https://arxiv.org/abs/2002.10200 
- 代码（即将开源）：https://github.com/Yuliang-Liu/bezier_curve_text_spotting
- 代码（即将开源）：https://github.com/aim-uofa/adet

**Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection**

- 论文：https://arxiv.org/abs/2003.07493

- 代码：https://github.com/GXYM/DRRG

<a name="Scene-Text-Recognition"></a>

# 场景文本识别

**SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition**

- 论文：https://arxiv.org/abs/2005.10977
- 代码：https://github.com/Pay20Y/SEED

**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**

- 论文：https://arxiv.org/abs/2003.10608
- 代码和数据集：https://github.com/Jyouhou/UnrealText/

**ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**

- 论文：https://arxiv.org/abs/2002.10200 
- 代码（即将开源）：https://github.com/aim-uofa/adet

**Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition**

- 论文：https://arxiv.org/abs/2003.06606

- 代码：https://github.com/Canjie-Luo/Text-Image-Augmentation

<a name="Feature"></a>

# 特征(点)检测和描述

**SuperGlue: Learning Feature Matching with Graph Neural Networks**

- 论文：https://arxiv.org/abs/1911.11763
- 代码：https://github.com/magicleap/SuperGluePretrainedNetwork

<a name="Super-Resolution"></a>

# 超分辨率

## 图像超分辨率

**Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html
- 代码：https://github.com/guoyongcs/DRN

**Learning Texture Transformer Network for Image Super-Resolution**

- 论文：https://arxiv.org/abs/2006.04139

- 代码：https://github.com/FuzhiYang/TTSR

**Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining**

- 论文：https://arxiv.org/abs/2006.01424
- 代码：https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention

**Structure-Preserving Super Resolution with Gradient Guidance**

- 论文：https://arxiv.org/abs/2003.13081

- 代码：https://github.com/Maclory/SPSR

**Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy**

论文：https://arxiv.org/abs/2004.00448

代码：https://github.com/clovaai/cutblur

## 视频超分辨率

**TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution**

- 论文：https://arxiv.org/abs/1812.02898
- 代码：https://github.com/YapengTian/TDAN-VSR-CVPR-2020

**Space-Time-Aware Multi-Resolution Video Enhancement**

- 主页：https://alterzero.github.io/projects/STAR.html
- 论文：http://arxiv.org/abs/2003.13170
- 代码：https://github.com/alterzero/STARnet

**Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**

- 论文：https://arxiv.org/abs/2002.11616 
- 代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 

<a name="Model-Compression"></a>

# 模型压缩/剪枝

**DMCP: Differentiable Markov Channel Pruning for Neural Networks**

- 论文：https://arxiv.org/abs/2005.03354
- 代码：https://github.com/zx55/dmcp

**Forward and Backward Information Retention for Accurate Binary Neural Networks**

- 论文：https://arxiv.org/abs/1909.10788

- 代码：https://github.com/htqin/IR-Net

**Towards Efficient Model Compression via Learned Global Ranking**

- 论文：https://arxiv.org/abs/1904.12368
- 代码：https://github.com/cmu-enyac/LeGR

**HRank: Filter Pruning using High-Rank Feature Map**

- 论文：http://arxiv.org/abs/2002.10179
- 代码：https://github.com/lmbxmu/HRank 

**GAN Compression: Efficient Architectures for Interactive Conditional GANs**

- 论文：https://arxiv.org/abs/2003.08936

- 代码：https://github.com/mit-han-lab/gan-compression

**Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression**

- 论文：https://arxiv.org/abs/2003.08935

- 代码：https://github.com/ofsoundof/group_sparsity

<a name="Action-Recognition"></a>

# 视频理解/行为识别

**Oops! Predicting Unintentional Action in Video**

- 主页：https://oops.cs.columbia.edu/

- 论文：https://arxiv.org/abs/1911.11206
- 代码：https://github.com/cvlab-columbia/oops
- 数据集：https://oops.cs.columbia.edu/data

**PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition**

- 论文：https://arxiv.org/abs/1911.12409
- 代码：https://github.com/shlizee/Predict-Cluster 

**Intra- and Inter-Action Understanding via Temporal Action Parsing**

- 论文：https://arxiv.org/abs/2005.10229
- 主页和数据集：https://sdolivia.github.io/TAPOS/

**3DV: 3D Dynamic Voxel for Action Recognition in Depth Video**

- 论文：https://arxiv.org/abs/2005.05501
- 代码：https://github.com/3huo/3DV-Action

**FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**

- 主页：https://sdolivia.github.io/FineGym/
- 论文：https://arxiv.org/abs/2004.06704

**TEA: Temporal Excitation and Aggregation for Action Recognition**

- 论文：https://arxiv.org/abs/2004.01398

- 代码：https://github.com/Phoenix1327/tea-action-recognition

**X3D: Expanding Architectures for Efficient Video Recognition**

- 论文：https://arxiv.org/abs/2004.04730

- 代码：https://github.com/facebookresearch/SlowFast

**Temporal Pyramid Network for Action Recognition**

- 主页：https://decisionforce.github.io/TPN

- 论文：https://arxiv.org/abs/2004.03548 
- 代码：https://github.com/decisionforce/TPN 

## 基于骨架的动作识别

**Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition**

- 论文：https://arxiv.org/abs/2003.14111
- 代码：https://github.com/kenziyuliu/ms-g3d

<a name="Crowd-Counting"></a>

# 人群计数

<a name="Depth-Estimation"></a>

# 深度估计

**BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf
- 代码：https://github.com/Yeh-yu-hsuan/BiFuse

**Focus on defocus: bridging the synthetic to real domain gap for depth estimation**

- 论文：https://arxiv.org/abs/2005.09623
- 代码：https://github.com/dvl-tum/defocus-net

**Bi3D: Stereo Depth Estimation via Binary Classifications**

- 论文：https://arxiv.org/abs/2005.07274

- 代码：https://github.com/NVlabs/Bi3D

**AANet: Adaptive Aggregation Network for Efficient Stereo Matching**

- 论文：https://arxiv.org/abs/2004.09548
- 代码：https://github.com/haofeixu/aanet

**Towards Better Generalization: Joint Depth-Pose Learning without PoseNet**

- 论文：https://github.com/B1ueber2y/TrianFlow

- 代码：https://github.com/B1ueber2y/TrianFlow

## 单目深度估计

**On the uncertainty of self-supervised monocular depth estimation**

- 论文：https://arxiv.org/abs/2005.06209
- 代码：https://github.com/mattpoggi/mono-uncertainty

**3D Packing for Self-Supervised Monocular Depth Estimation**

- 论文：https://arxiv.org/abs/1905.02693
- 代码：https://github.com/TRI-ML/packnet-sfm
- Demo视频：https://www.bilibili.com/video/av70562892/

**Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation**

- 论文：https://arxiv.org/abs/2002.12114
- 代码：https://github.com/yzhao520/ARC

<a name="6DOF"></a>

# 6D目标姿态估计

 **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf
- 代码：https://github.com/ethnhe/PVN3D

**MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion**

- 论文：https://arxiv.org/abs/2004.04336
- 代码：https://github.com/wkentaro/morefusion

**EPOS: Estimating 6D Pose of Objects with Symmetries**

主页：http://cmp.felk.cvut.cz/epos

论文：https://arxiv.org/abs/2004.00605

**G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features**

- 论文：https://arxiv.org/abs/2003.11089

- 代码：https://github.com/DC1991/G2L_Net

<a name="Hand-Pose"></a>

# 手势估计

**HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation**

- 论文：https://arxiv.org/abs/2004.00060

- 主页：http://vision.sice.indiana.edu/projects/hopenet

**Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data**

- 论文：https://arxiv.org/abs/2003.09572

- 代码：https://github.com/CalciferZh/minimal-hand

<a name="Saliency"></a>

# 显著性检测

**JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection**

- 论文：https://arxiv.org/abs/2004.08515

- 代码：https://github.com/kerenfu/JLDCF/

**UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders**

- 主页：http://dpfan.net/d3netbenchmark/

- 论文：https://arxiv.org/abs/2004.05763
- 代码：https://github.com/JingZhang617/UCNet

<a name="Denoising"></a>

# 去噪

**A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising**

- 论文：https://arxiv.org/abs/2003.12751

- 代码：https://github.com/Vandermode/NoiseModel

**CycleISP: Real Image Restoration via Improved Data Synthesis**

- 论文：https://arxiv.org/abs/2003.07761

- 代码：https://github.com/swz30/CycleISP

<a name="Deraining"></a>

# 去雨

**Multi-Scale Progressive Fusion Network for Single Image Deraining**

- 论文：https://arxiv.org/abs/2003.10985
- 代码：https://github.com/kuihua/MSPFN

**Detail-recovery Image Deraining via Context Aggregation Networks**

- 论文：https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html
- 代码：https://github.com/Dengsgithub/DRD-Net

<a name="Deblurring"></a>

# 去模糊

## 视频去模糊

**Cascaded Deep Video Deblurring Using Temporal Sharpness Prior**

- 主页：https://csbhr.github.io/projects/cdvd-tsp/index.html 
- 论文：https://arxiv.org/abs/2004.02501 
- 代码：https://github.com/csbhr/CDVD-TSP

<a name="Dehazing"></a>

# 去雾

**Domain Adaptation for Image Dehazing**

- 论文：https://arxiv.org/abs/2005.04668

- 代码：https://github.com/HUSTSYJ/DA_dahazing

**Multi-Scale Boosted Dehazing Network with Dense Feature Fusion**

- 论文：https://arxiv.org/abs/2004.13388

- 代码：https://github.com/BookerDeWitt/MSBDN-DFF

<a name="Feature"></a>

# 特征点检测与描述

**ASLFeat: Learning Local Features of Accurate Shape and Localization**

- 论文：https://arxiv.org/abs/2003.10071

- 代码：https://github.com/lzx551402/aslfeat

<a name="VQA"></a>

# 视觉问答(VQA)

**VC R-CNN：Visual Commonsense R-CNN** 

- 论文：https://arxiv.org/abs/2002.12204
- 代码：https://github.com/Wangt-CN/VC-R-CNN

<a name="VideoQA"></a>

# 视频问答(VideoQA)

**Hierarchical Conditional Relation Networks for Video Question Answering**

- 论文：https://arxiv.org/abs/2002.10698
- 代码：https://github.com/thaolmk54/hcrn-videoqa

<a name="VLN"></a>

# 视觉语言导航

**Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training**

- 论文：https://arxiv.org/abs/2002.10638
- 代码（即将开源）：https://github.com/weituo12321/PREVALENT

<a name="Video-Compression"></a>

# 视频压缩

**Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement**

- 论文：https://arxiv.org/abs/2003.01966 
- 代码：https://github.com/RenYang-home/HLVC

<a name="Video-Frame-Interpolation"></a>

# 视频插帧

**AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation**

- 论文：https://arxiv.org/abs/1907.10244
- 代码：https://github.com/HyeongminLEE/AdaCoF-pytorch

**FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html

- 代码：https://github.com/CM-BF/FeatureFlow

**Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**

- 论文：https://arxiv.org/abs/2002.11616
- 代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020

**Space-Time-Aware Multi-Resolution Video Enhancement**

- 主页：https://alterzero.github.io/projects/STAR.html
- 论文：http://arxiv.org/abs/2003.13170
- 代码：https://github.com/alterzero/STARnet

**Scene-Adaptive Video Frame Interpolation via Meta-Learning**

- 论文：https://arxiv.org/abs/2004.00779
- 代码：https://github.com/myungsub/meta-interpolation

**Softmax Splatting for Video Frame Interpolation**

- 主页：http://sniklaus.com/papers/softsplat
- 论文：https://arxiv.org/abs/2003.05534
- 代码：https://github.com/sniklaus/softmax-splatting

<a name="Style-Transfer"></a>

# 风格迁移

**Diversified Arbitrary Style Transfer via Deep Feature Perturbation**

- 论文：https://arxiv.org/abs/1909.08223
- 代码：https://github.com/EndyWon/Deep-Feature-Perturbation

**Collaborative Distillation for Ultra-Resolution Universal Style Transfer**

- 论文：https://arxiv.org/abs/2003.08436

- 代码：https://github.com/mingsun-tse/collaborative-distillation

<a name="Lane-Detection"></a>

# 车道线检测

**Inter-Region Affinity Distillation for Road Marking Segmentation**

- 论文：https://arxiv.org/abs/2004.05304
- 代码：https://github.com/cardwing/Codes-for-IntRA-KD

<a name="HOI"></a>

# "人-物"交互(HOT)检测

**PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection**

- 论文：https://arxiv.org/abs/1912.12898
- 代码：https://github.com/YueLiao/PPDM

**Detailed 2D-3D Joint Representation for Human-Object Interaction**

- 论文：https://arxiv.org/abs/2004.08154

- 代码：https://github.com/DirtyHarryLYL/DJ-RN

**Cascaded Human-Object Interaction Recognition**

- 论文：https://arxiv.org/abs/2003.04262

- 代码：https://github.com/tfzhou/C-HOI

**VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions**

- 论文：https://arxiv.org/abs/2003.05541
- 代码：https://github.com/ASMIftekhar/VSGNet

<a name="TP"></a>

# 轨迹预测

**The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**

- 论文：https://arxiv.org/abs/1912.06445
- 代码：https://github.com/JunweiLiang/Multiverse
- 数据集：https://next.cs.cmu.edu/multiverse/

**Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction**

- 论文：https://arxiv.org/abs/2002.11927 
- 代码：https://github.com/abduallahmohamed/Social-STGCNN 

<a name="Motion-Predication"></a>

# 运动预测

**Collaborative Motion Prediction via Neural Motion Message Passing**

- 论文：https://arxiv.org/abs/2003.06594
- 代码：https://github.com/PhyllisH/NMMP

**MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps**

- 论文：https://arxiv.org/abs/2003.06754

- 代码：https://github.com/pxiangwu/MotionNet

<a name="OF"></a>

# 光流估计

**Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation**

- 论文：https://arxiv.org/abs/2003.13045
- 代码：https://github.com/lliuz/ARFlow 

<a name="IR"></a>

# 图像检索

**Evade Deep Image Retrieval by Stashing Private Images in the Hash Space**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html
- 代码：https://github.com/sugarruy/hashstash

<a name="Virtual-Try-On"></a>

# 虚拟试衣

**Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content**

- 论文：https://arxiv.org/abs/2003.05863
- 代码：https://github.com/switchablenorms/DeepFashion_Try_On

<a name="HDR"></a>

# HDR

**Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline**

- 主页：https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR

- 论文下载链接：https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf

- 代码：https://github.com/alex04072000/SingleHDR

<a name="AE"></a>

# 对抗样本

**Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction**

- 论文：https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf
- 代码：https://github.com/erbloo/dr_cvpr20 

**Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance**

- 论文：https://arxiv.org/abs/1911.02466
- 代码：https://github.com/ZhengyuZhao/PerC-Adversarial 

<a name="3D-Reconstructing"></a>

# 三维重建

**Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild**

- **CVPR 2020 Best Paper**
- 主页：https://elliottwu.com/projects/unsup3d/
- 论文：https://arxiv.org/abs/1911.11130
- 代码：https://github.com/elliottwu/unsup3d

**Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization**

- 主页：https://shunsukesaito.github.io/PIFuHD/
- 论文：https://arxiv.org/abs/2004.00452
- 代码：https://github.com/facebookresearch/pifuhd

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
- 代码：https://github.com/chaitanya100100/TailorNet
- 数据集：https://github.com/zycliao/TailorNet_dataset

**Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf
- 代码：https://github.com/jchibane/if-net

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf
- 代码：https://github.com/aymenmir1/pix2surf

<a name="DC"></a>

# 深度补全

**Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End**

论文：https://arxiv.org/abs/2006.03349

代码：https://github.com/abdo-eldesokey/pncnn

<a name="SSC"></a>

# 语义场景补全

**3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior**

- 论文：https://arxiv.org/abs/2003.14052
- 代码：https://github.com/charlesCXK/TorchSSC

<a name="Captioning"></a>

# 图像/视频描述

**Syntax-Aware Action Targeting for Video Captioning**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf
- 代码：https://github.com/SydCaption/SAAT 

<a name="WP"></a>

# 线框解析

**Holistically-Attracted Wireframe Parser**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html

- 代码：https://github.com/cherubicXN/hawp

<a name="Datasets"></a>

# 数据集

**OASIS: A Large-Scale Dataset for Single Image 3D in the Wild**

- 论文：https://arxiv.org/abs/2007.13215
- 数据集：https://oasis.cs.princeton.edu/

**STEFANN: Scene Text Editor using Font Adaptive Neural Network**

- 主页：https://prasunroy.github.io/stefann/

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
- 代码：https://github.com/prasunroy/stefann
- 数据集：https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k

**Interactive Object Segmentation with Inside-Outside Guidance**

- 论文下载链接：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
- 代码：https://github.com/shiyinzhang/Inside-Outside-Guidance
- 数据集：https://github.com/shiyinzhang/Pixel-ImageNet

**Video Panoptic Segmentation**

- 论文：https://arxiv.org/abs/2006.11339
- 代码：https://github.com/mcahny/vps
- 数据集：https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0

**FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html

- 代码：https://github.com/HKUSTCV/FSS-1000

- 数据集：https://github.com/HKUSTCV/FSS-1000

**3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**

- 主页：https://vap.aau.dk/3d-zef/
- 论文：https://arxiv.org/abs/2006.08466
- 代码：https://bitbucket.org/aauvap/3d-zef/src/master/
- 数据集：https://motchallenge.net/data/3D-ZeF20

**TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
- 代码：https://github.com/chaitanya100100/TailorNet
- 数据集：https://github.com/zycliao/TailorNet_dataset

**Oops! Predicting Unintentional Action in Video**

- 主页：https://oops.cs.columbia.edu/

- 论文：https://arxiv.org/abs/1911.11206
- 代码：https://github.com/cvlab-columbia/oops
- 数据集：https://oops.cs.columbia.edu/data

**The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**

- 论文：https://arxiv.org/abs/1912.06445
- 代码：https://github.com/JunweiLiang/Multiverse
- 数据集：https://next.cs.cmu.edu/multiverse/

**Open Compound Domain Adaptation**

- 主页：https://liuziwei7.github.io/projects/CompoundDomain.html
- 数据集：https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
- 论文：https://arxiv.org/abs/1909.03403
- 代码：https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA

**Intra- and Inter-Action Understanding via Temporal Action Parsing**

- 论文：https://arxiv.org/abs/2005.10229
- 主页和数据集：https://sdolivia.github.io/TAPOS/

**Dynamic Refinement Network for Oriented and Densely Packed Object Detection**

- 论文下载链接：https://arxiv.org/abs/2005.09973

- 代码和数据集：https://github.com/Anymake/DRN_CVPR2020

**COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**

- 论文：https://arxiv.org/abs/2005.07862

- 数据集：暂无

**KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations**

- 论文：https://arxiv.org/abs/2002.12687

- 数据集：https://github.com/qq456cvb/KeypointNet

**MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**

- 论文：http://vladlen.info/papers/MSeg.pdf
- 代码：https://github.com/mseg-dataset/mseg-api
- 数据集：https://github.com/mseg-dataset/mseg-semantic

**AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**

- 论文：https://arxiv.org/abs/2003.13845
- 数据集：https://github.com/lattas/AvatarMe

**Learning to Autofocus**

- 论文：https://arxiv.org/abs/2004.12260
- 数据集：暂无

**FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**

- 论文：https://arxiv.org/abs/2003.13989
- 代码：https://github.com/zhuhao-nju/facescape

**Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**

- 论文下载链接：https://arxiv.org/abs/2004.01166

- 代码：https://github.com/Healthcare-Robotics/bodies-at-rest
- 数据集：https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML

**FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**

- 主页：https://sdolivia.github.io/FineGym/
- 论文：https://arxiv.org/abs/2004.06704

**A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**

- 主页：https://anyirao.com/projects/SceneSeg.html

- 论文下载链接：https://arxiv.org/abs/2004.02678

- 代码：https://github.com/AnyiRao/SceneSeg

**Deep Homography Estimation for Dynamic Scenes**

- 论文：https://arxiv.org/abs/2004.02132

- 数据集：https://github.com/lcmhoang/hmg-dynamics

**Assessing Image Quality Issues for Real-World Problems**

- 主页：https://vizwiz.org/tasks-and-datasets/image-quality-issues/
- 论文：https://arxiv.org/abs/2003.12511

**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**

- 论文：https://arxiv.org/abs/2003.10608
- 代码和数据集：https://github.com/Jyouhou/UnrealText/

**PANDA: A Gigapixel-level Human-centric Video Dataset**

- 论文：https://arxiv.org/abs/2003.04852

- 数据集：http://www.panda-dataset.com/

**IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning**

- 论文：https://arxiv.org/abs/2003.02920
- 数据集：https://github.com/intra3d2019/IntrA

**Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**

- 论文：https://arxiv.org/abs/2003.03972
- 数据集：暂无

<a name="Others"></a>

# 其他

**CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus**

- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html
- 代码：https://github.com/fkluger/consac

**Learning to Learn Single Domain Generalization**

- 论文：https://arxiv.org/abs/2003.13216
- 代码：https://github.com/joffery/M-ADA

**Open Compound Domain Adaptation**

- 主页：https://liuziwei7.github.io/projects/CompoundDomain.html
- 数据集：https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
- 论文：https://arxiv.org/abs/1909.03403
- 代码：https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA

**Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision**

- 论文：http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf

- 代码：https://github.com/autonomousvision/differentiable_volumetric_rendering

**QEBA: Query-Efficient Boundary-Based Blackbox Attack**

- 论文：https://arxiv.org/abs/2005.14137
- 代码：https://github.com/AI-secure/QEBA

**Equalization Loss for Long-Tailed Object Recognition**

- 论文：https://arxiv.org/abs/2003.05176
- 代码：https://github.com/tztztztztz/eql.detectron2

**Instance-aware Image Colorization**

- 主页：https://ericsujw.github.io/InstColorization/
- 论文：https://arxiv.org/abs/2005.10825
- 代码：https://github.com/ericsujw/InstColorization

**Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting**

- 论文：https://arxiv.org/abs/2005.09704

- 代码：https://github.com/Atlas200dk/sample-imageinpainting-HiFill

**Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching**

- 论文：https://arxiv.org/abs/2005.03860
- 代码：https://github.com/shiyujiao/cross_view_localization_DSM

**Epipolar Transformers**

- 论文：https://arxiv.org/abs/2005.04551

- 代码：https://github.com/yihui-he/epipolar-transformers 

**Bringing Old Photos Back to Life**

- 主页：http://raywzy.com/Old_Photo/
- 论文：https://arxiv.org/abs/2004.09484

**MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask**

- 论文：https://arxiv.org/abs/2003.10955 

- 代码：https://github.com/microsoft/MaskFlownet 

**Self-Supervised Viewpoint Learning from Image Collections**

- 论文：https://arxiv.org/abs/2004.01793
- 论文2：https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf 
- 代码：https://github.com/NVlabs/SSV 

**Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations**

- Oral

- 论文：https://arxiv.org/abs/2003.12237 
- 代码：https://github.com/cuishuhao/BNM 

**Towards Learning Structure via Consensus for Face Segmentation and Parsing**

- 论文：https://arxiv.org/abs/1911.00957
- 代码：https://github.com/isi-vista/structure_via_consensus

**Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging**

- Oral
- 论文：https://arxiv.org/abs/2003.13654

- 代码：https://github.com/liuyang12/PnP-SCI

**Lightweight Photometric Stereo for Facial Details Recovery**

- 论文：https://arxiv.org/abs/2003.12307
- 代码：https://github.com/Juyong/FacePSNet

**Footprints and Free Space from a Single Color Image**

- 论文：https://arxiv.org/abs/2004.06376

- 代码：https://github.com/nianticlabs/footprints

**Self-Supervised Monocular Scene Flow Estimation**

- 论文：https://arxiv.org/abs/2004.04143
- 代码：https://github.com/visinf/self-mono-sf

**Quasi-Newton Solver for Robust Non-Rigid Registration**

- 论文：https://arxiv.org/abs/2004.04322
- 代码：https://github.com/Juyong/Fast_RNRR

**A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**

- 主页：https://anyirao.com/projects/SceneSeg.html

- 论文下载链接：https://arxiv.org/abs/2004.02678

- 代码：https://github.com/AnyiRao/SceneSeg

**DeepFLASH: An Efficient Network for Learning-based Medical Image Registration**

- 论文：https://arxiv.org/abs/2004.02097

- 代码：https://github.com/jw4hv/deepflash

**Self-Supervised Scene De-occlusion**

- 主页：https://xiaohangzhan.github.io/projects/deocclusion/
- 论文：https://arxiv.org/abs/2004.02788
- 代码：https://github.com/XiaohangZhan/deocclusion

**Polarized Reflection Removal with Perfect Alignment in the Wild** 

- 主页：https://leichenyang.weebly.com/project-polarized.html
- 代码：https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment 

**Background Matting: The World is Your Green Screen**

- 论文：https://arxiv.org/abs/2004.00626
- 代码：http://github.com/senguptaumd/Background-Matting

**What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective**

- 论文：https://arxiv.org/abs/2003.11241

- 代码：https://github.com/ZhangLi-CS/GCP_Optimization

**Look-into-Object: Self-supervised Structure Modeling for Object Recognition**

- 论文：暂无
- 代码：https://github.com/JDAI-CV/LIO 

 **Video Object Grounding using Semantic Roles in Language Description**

- 论文：https://arxiv.org/abs/2003.10606
- 代码：https://github.com/TheShadow29/vognet-pytorch 

**Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives**

- 论文：https://arxiv.org/abs/2003.10739
- 代码：https://github.com/d-li14/DHM 

**SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization**

- 论文：http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf
- 代码：https://github.com/YueJiang-nj/CVPR2020-SDFDiff 

**On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location**

- 论文：https://arxiv.org/abs/2003.07064

- 代码：https://github.com/oskyhn/CNNs-Without-Borders

**GhostNet: More Features from Cheap Operations**

- 论文：https://arxiv.org/abs/1911.11907

- 代码：https://github.com/iamhankai/ghostnet

**AdderNet: Do We Really Need Multiplications in Deep Learning?** 

- 论文：https://arxiv.org/abs/1912.13200 
- 代码：https://github.com/huawei-noah/AdderNet

**Deep Image Harmonization via Domain Verification** 

- 论文：https://arxiv.org/abs/1911.13239 
- 代码：https://github.com/bcmi/Image_Harmonization_Datasets

**Blurry Video Frame Interpolation**

- 论文：https://arxiv.org/abs/2002.12259 
- 代码：https://github.com/laomao0/BIN

**Extremely Dense Point Correspondences using a Learned Feature Descriptor**

- 论文：https://arxiv.org/abs/2003.00619 
- 代码：https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch

**Filter Grafting for Deep Neural Networks**

- 论文：https://arxiv.org/abs/2001.05868
- 代码：https://github.com/fxmeng/filter-grafting
- 论文解读：https://www.zhihu.com/question/372070853/answer/1041569335

**Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation**

- 论文：https://arxiv.org/abs/2003.02824 
- 代码：https://github.com/cmhungsteve/SSTDA

**Detecting Attended Visual Targets in Video**

- 论文：https://arxiv.org/abs/2003.02501 

- 代码：https://github.com/ejcgt/attention-target-detection

**Deep Image Spatial Transformation for Person Image Generation**

- 论文：https://arxiv.org/abs/2003.00696 
- 代码：https://github.com/RenYurui/Global-Flow-Local-Attention

 **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications** 

- 论文：https://arxiv.org/abs/2003.01455
- 代码：https://github.com/bbrattoli/ZeroShotVideoClassification

https://github.com/charlesCXK/3D-SketchAware-SSC

https://github.com/Anonymous20192020/Anonymous_CVPR5767

https://github.com/avirambh/ScopeFlow

https://github.com/csbhr/CDVD-TSP

https://github.com/ymcidence/TBH

https://github.com/yaoyao-liu/mnemonics

https://github.com/meder411/Tangent-Images

https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch

https://github.com/sjmoran/deep_local_parametric_filters

https://github.com/charlesCXK/3D-SketchAware-SSC

https://github.com/bermanmaxim/AOWS

https://github.com/dc3ea9f/look-into-object 

<a name="Not-Sure"></a>

# 不确定中没中

**FADNet: A Fast and Accurate Network for Disparity Estimation**

- 论文：还没出来
- 代码：https://github.com/HKBU-HPML/FADNet

https://github.com/rFID-submit/RandomFID：不确定中没中

https://github.com/JackSyu/AE-MSR：不确定中没中

https://github.com/fastconvnets/cvpr2020：不确定中没中

https://github.com/aimagelab/meshed-memory-transformer：不确定中没中

https://github.com/TWSFar/CRGNet：不确定中没中

https://github.com/CVPR-2020/CDARTS：不确定中没中

https://github.com/anucvml/ddn-cvprw2020：不确定中没中

https://github.com/dl-model-recommend/model-trust：不确定中没中

https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior：不确定中没中

https://github.com/onetcvpr/O-Net：不确定中没中

https://github.com/502463708/Microcalcification_Detection：不确定中没中

https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine：不确定中没中

https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset：不确定中没中

https://github.com/cvpr-nonrigid/dataset：不确定中没中

https://github.com/theFool32/PPBA：不确定中没中

https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition

================================================
FILE: CVPR2021-Papers-with-Code.md
================================================
# CVPR 2021 论文和开源项目合集(Papers with Code)

[CVPR 2021](http://cvpr2021.thecvf.com/) 论文和开源项目合集(papers with code)！

CVPR 2021 收录列表：http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt

> 注1：欢迎各位大佬提交issue，分享CVPR 2021论文和开源项目！
>
> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ 

![](CVer学术交流群.png)

## 【CVPR 2021 论文开源目录】

- [Best Paper](#Best-Paper)
- [Backbone](#Backbone)
- [NAS](#NAS)
- [GAN](#GAN)
- [VAE](#VAE)
- [Visual Transformer](#Visual-Transformer)
- [Regularization](#Regularization)
- [SLAM](#SLAM)
- [长尾分布(Long-Tailed)](#Long-Tailed)
- [数据增广(Data Augmentation)](#DA)
- [无监督/自监督(Self-Supervised)](#Un/Self-Supervised)
- [半监督(Semi-Supervised)](#Semi-Supervised)
- [胶囊网络(Capsule Network)](#Capsule-Network)
- [图像分类(Image Classification](#Image-Classification)
- [2D目标检测(Object Detection)](#Object-Detection)
- [单/多目标跟踪(Object Tracking)](#Object-Tracking)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation)
- [视频目标分割(Video-Object-Segmentation)](#VOS)
- [交互式视频目标分割(Interactive-Video-Object-Segmentation)](#IVOS)
- [显著性检测(Saliency Detection)](#Saliency-Detection)
- [伪装物体检测(Camouflaged Object Detection)](#Camouflaged-Object-Detection)
- [协同显著性检测(Co-Salient Object Detection)](#CoSOD)
- [图像抠图(Image Matting)](#Matting)
- [行人重识别(Person Re-identification)](#Re-ID)
- [行人搜索(Person Search)](#Person-Search)
- [视频理解/行为识别(Video Understanding)](#Video-Understanding)
- [人脸识别(Face Recognition)](#Face-Recognition)
- [人脸检测(Face Detection)](#Face-Detection)
- [人脸活体检测(Face Anti-Spoofing)](#Face-Anti-Spoofing)
- [Deepfake检测(Deepfake Detection)](#Deepfake-Detection)
- [人脸年龄估计(Age-Estimation)](#Age-Estimation)
- [人脸表情识别(Facial-Expression-Recognition)](#FER)
- [Deepfakes](#Deepfakes)
- [人体解析(Human Parsing)](#Human-Parsing)
- [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation)
- [动物姿态估计(Animal Pose Estimation)](#Animal-Pose-Estimation)
- [手部姿态估计(Hand Pose Estimation)](#Hand-Pose-Estimation)
- [Human Volumetric Capture](#Human-Volumetric-Capture)
- [场景文本识别(Scene Text Recognition)](#Scene-Text-Recognition)
- [图像压缩(Image Compression)](#Image-Compression)
- [模型压缩/剪枝/量化](#Model-Compression)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [超分辨率(Super-Resolution)](#Super-Resolution)
- [去雾(Dehazing)](#Dehazing)
- [图像恢复(Image Restoration)](#Image-Restoration)
- [图像补全(Image Inpainting)](#Image-Inpainting)
- [图像编辑(Image Editing)](#Image-Editing)
- [图像描述(Image Captioning)](#Image-Captioning)
- [字体生成(Font Generation)](#Font-Generation)
- [图像匹配(Image Matching)](#Image-Matching)
- [图像融合(Image Blending)](#Image-Blending)
- [反光去除(Reflection Removal)](#Reflection-Removal)
- [3D点云分类(3D Point Clouds Classification)](#3D-C)
- [3D目标检测(3D Object Detection)](#3D-Object-Detection)
- [3D语义分割(3D Semantic Segmentation)](#3D-Semantic-Segmentation)
- [3D全景分割(3D Panoptic Segmentation)](#3D-Panoptic-Segmentation)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D点云配准(3D Point Cloud Registration)](#3D-PointCloud-Registration)
- [3D点云补全(3D-Point-Cloud-Completion)](#3D-Point-Cloud-Completion)
- [3D重建(3D Reconstruction)](#3D-Reconstruction)
- [6D位姿估计(6D Pose Estimation)](#6D-Pose-Estimation)
- [相机姿态估计(Camera Pose Estimation)](#Camera-Pose-Estimation)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [光流估计(Flow Estimation)](#Flow-Estimation)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction)
- [人群计数(Crowd Counting)](#Crowd-Counting)
- [对抗样本(Adversarial-Examples)](#AE)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [视频检索(Video Retrieval)](#Video-Retrieval)
- [跨模态检索(Cross-modal Retrieval)](#Cross-modal-Retrieval) 
- [Zero-Shot Learning](#Zero-Shot-Learning)
- [联邦学习(Federated Learning)](#Federated-Learning)
- [视频插帧(Video Frame Interpolation)](#Video-Frame-Interpolation)
- [视觉推理(Visual Reasoning)](#Visual-Reasoning)
- [图像合成(Image Synthesis)](#Image-Synthesis)
- [视图合成(Visual Synthesis)](#Visual-Synthesis)
- [风格迁移(Style Transfer)](#Style-Transfer)
- [布局生成(Layout Generation)](#Layout-Generation)
- [Domain Generalization](#Domain-Generalization)
- [Domain Adaptation](#Domain-Adaptation)
- [Open-Set](#Open-Set)
- [Adversarial Attack](#Adversarial-Attack)
- ["人-物"交互(HOI)检测](#HOI)
- [阴影去除(Shadow Removal)](#Shadow-Removal)
- [虚拟试衣(Virtual Try-On)](#Virtual-Try-On)
- [标签噪声(Label Noise)](#Label-Noise)
- [视频稳像(Video Stabilization)](#Video-Stabilization)
- [数据集(Datasets)](#Datasets)
- [其他(Others)](#Others)
- [待添加(TODO)](#TO-DO)
- [不确定中没中(Not Sure)](#Not-Sure)

<a name="Best-Paper"></a>

# Best Paper

**GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields**

- Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html
- Paper(Oral): https://arxiv.org/abs/2011.12100

- Code: https://github.com/autonomousvision/giraffe

- Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1

<a name="Backbone"></a>

# Backbone

**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers**

- Paper(Oral): https://arxiv.org/abs/2106.06560

- Code: https://github.com/dingmyu/HR-NAS

**BCNet: Searching for Network Width with Bilaterally Coupled Network**

- Paper: https://arxiv.org/abs/2105.10533
- Code: None

**Decoupled Dynamic Filter Networks**

- Homepage: https://thefoxofsky.github.io/project_pages/ddf
- Paper: https://arxiv.org/abs/2104.14107
- Code: https://github.com/thefoxofsky/DDF

**Lite-HRNet: A Lightweight High-Resolution Network**

- Paper: https://arxiv.org/abs/2104.06403
- https://github.com/HRNet/Lite-HRNet

**CondenseNet V2: Sparse Feature Reactivation for Deep Networks**

- Paper: https://arxiv.org/abs/2104.04382

- Code: https://github.com/jianghaojun/CondenseNetV2

**Diverse Branch Block: Building a Convolution as an Inception-like Unit**

- Paper: https://arxiv.org/abs/2103.13425

- Code: https://github.com/DingXiaoH/DiverseBranchBlock

**Scaling Local Self-Attention For Parameter Efficient Visual Backbones**

- Paper(Oral): https://arxiv.org/abs/2103.12731

- Code: None

**ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network**

- Paper: https://arxiv.org/abs/2007.00992
- Code:  https://github.com/clovaai/rexnet

**Involution: Inverting the Inherence of Convolution for Visual Recognition**

- Paper: https://github.com/d-li14/involution
- Code: https://arxiv.org/abs/2103.06255

**Coordinate Attention for Efficient Mobile Network Design**

- Paper:  https://arxiv.org/abs/2103.02907
- Code: https://github.com/Andrew-Qibin/CoordAttention

**Inception Convolution with Efficient Dilation Search**

- Paper:  https://arxiv.org/abs/2012.13587 
- Code: https://github.com/yifan123/IC-Conv

**RepVGG: Making VGG-style ConvNets Great Again**

- Paper: https://arxiv.org/abs/2101.03697
- Code: https://github.com/DingXiaoH/RepVGG

<a name="NAS"></a>

# NAS

**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers**

- Paper(Oral): https://arxiv.org/abs/2106.06560

- Code: https://github.com/dingmyu/HR-NAS

**BCNet: Searching for Network Width with Bilaterally Coupled Network**

- Paper: https://arxiv.org/abs/2105.10533
- Code: None

**ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search**

- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None

**Combined Depth Space based Architecture Search For Person Re-identification**

- Paper: https://arxiv.org/abs/2104.04163
- Code: None

**DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation**

- Paper(Oral): https://arxiv.org/abs/2103.15954
- Code: None

**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers**

- Paper(Oral): None
- Code: https://github.com/dingmyu/HR-NAS

**Neural Architecture Search with Random Labels**

- Paper: https://arxiv.org/abs/2101.11834
- Code: None

**Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search**

- Paper: https://arxiv.org/abs/2101.11342
- Code: None

**Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation**

- Paper:  https://arxiv.org/abs/2105.12971 
- Code: None

**Prioritized Architecture Sampling with Monto-Carlo Tree Search**

- Paper: https://arxiv.org/abs/2103.11922
- Code: https://github.com/xiusu/NAS-Bench-Macro

**Contrastive Neural Architecture Search with Neural Architecture Comparators**

- Paper: https://arxiv.org/abs/2103.05471
- Code: https://github.com/chenyaofo/CTNAS

**AttentiveNAS: Improving Neural Architecture Search via Attentive** 

- Paper: https://arxiv.org/abs/2011.09011
- Code: None

**ReNAS: Relativistic Evaluation of Neural Architecture Search**

- Paper: https://arxiv.org/abs/1910.01523
- Code: None

**HourNAS: Extremely Fast Neural Architecture**

- Paper: https://arxiv.org/abs/2005.14446
- Code: None

**Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator**

- Paper: https://arxiv.org/abs/2103.07289
- Code: https://github.com/eric8607242/SGNAS

**OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection**

- Paper: https://arxiv.org/abs/2103.04507
- Code: https://github.com/VDIGPKU/OPANAS

**Inception Convolution with Efficient Dilation Search**

- Paper:  https://arxiv.org/abs/2012.13587 
- Code: None

<a name="GAN"></a>

# GAN

**High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network**

- Paper: https://arxiv.org/abs/2105.09188
- Code: https://github.com/csjliang/LPTN
- Dataset: https://github.com/csjliang/LPTN

**DG-Font: Deformable Generative Networks for Unsupervised Font Generation**

- Paper: https://arxiv.org/abs/2104.03064

- Code: https://github.com/ecnuycxie/DG-Font

**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**

- Paper: https://arxiv.org/abs/2105.02201
- Code: https://github.com/KumapowerLIU/PD-GAN

**StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**

- Paper: https://arxiv.org/abs/2104.14754
- Code: https://github.com/naver-ai/StyleMapGAN
- Demo Video: https://youtu.be/qCapNyRA_Ng

**Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer**

- Paper: https://arxiv.org/abs/2104.05376
- Code: https://github.com/PaddlePaddle/PaddleGAN/

**Regularizing Generative Adversarial Networks under Limited Data**

- Homepage: https://hytseng0509.github.io/lecam-gan/
- Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf
- Code: https://github.com/google/lecam-gan

**Towards Real-World Blind Face Restoration with Generative Facial Prior**

- Paper: https://arxiv.org/abs/2101.04061
- Code: None

**TediGAN: Text-Guided Diverse Image Generation and Manipulation**

- Homepage: https://xiaweihao.com/projects/tedigan/

- Paper: https://arxiv.org/abs/2012.03308
- Code: https://github.com/weihaox/TediGAN

**Generative Hierarchical Features from Synthesizing Image**

- Homepage: https://genforce.github.io/ghfeat/

- Paper(Oral): https://arxiv.org/abs/2007.10379
- Code: https://github.com/genforce/ghfeat

**Teachers Do More Than Teach: Compressing Image-to-Image Models**

- Paper: https://arxiv.org/abs/2103.03467
- Code: https://github.com/snap-research/CAT

**HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms**

- Paper: https://arxiv.org/abs/2011.11731
- Code: https://github.com/mahmoudnafifi/HistoGAN

**pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis**

- Homepage: https://marcoamonteiro.github.io/pi-GAN-website/

- Paper(Oral): https://arxiv.org/abs/2012.00926
- Code: None

**DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network**

- Paper: https://arxiv.org/abs/2103.07893
- Code: None

**Diverse Semantic Image Synthesis via Probability Distribution Modeling**

- Paper: https://arxiv.org/abs/2103.06878
- Code: https://github.com/tzt101/INADE.git

**LOHO: Latent Optimization of Hairstyles via Orthogonalization**

- Paper: https://arxiv.org/abs/2103.03891
- Code: None

**PISE: Person Image Synthesis and Editing with Decoupled GAN**

- Paper: https://arxiv.org/abs/2103.04023
- Code: https://github.com/Zhangjinso/PISE

**DeFLOCNet: Deep Image Editing via Flexible Low-level Controls**

- Paper: http://raywzy.com/
- Code: http://raywzy.com/

**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**

- Paper: http://raywzy.com/
- Code: http://raywzy.com/

**Efficient Conditional GAN Transfer with Knowledge Propagation across Classes**

- Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes
- Code: http://github.com/mshahbazi72/cGANTransfer

**Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**

- Paper: None
- Code: None

**Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs**

- Paper: https://arxiv.org/abs/2011.14107
- Code: None

**Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation**

- Homepage: https://eladrich.github.io/pixel2style2pixel/
- Paper: https://arxiv.org/abs/2008.00951
- Code: https://github.com/eladrich/pixel2style2pixel

**A 3D GAN for Improved Large-pose Facial Recognition**

- Paper: https://arxiv.org/abs/2012.10545
- Code: None

**HumanGAN: A Generative Model of Humans Images**

- Paper: https://arxiv.org/abs/2103.06902
- Code: None

**ID-Unet: Iterative Soft and Hard Deformation for View Synthesis**

- Paper: https://arxiv.org/abs/2103.02264
- Code: https://github.com/MingyuY/Iterative-view-synthesis

**CoMoGAN: continuous model-guided image-to-image translation**

- Paper(Oral): https://arxiv.org/abs/2103.06879
- Code: https://github.com/cv-rits/CoMoGAN

**Training Generative Adversarial Networks in One Stage**

- Paper: https://arxiv.org/abs/2103.00430
- Code: None

**Closed-Form Factorization of Latent Semantics in GANs**

- Homepage: https://genforce.github.io/sefa/
- Paper(Oral): https://arxiv.org/abs/2007.06600
- Code: https://github.com/genforce/sefa

**Anycost GANs for Interactive Image Synthesis and Editing**

- Paper: https://arxiv.org/abs/2103.03243
- Code: https://github.com/mit-han-lab/anycost-gan

**Image-to-image Translation via Hierarchical Style Disentanglement**

- Paper: https://arxiv.org/abs/2103.01456
- Code: https://github.com/imlixinyang/HiSD

<a name="VAE"></a>

# VAE

**Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders**

- Homepage: https://taldatech.github.io/soft-intro-vae-web/

- Paper: https://arxiv.org/abs/2012.13253
- Code: https://github.com/taldatech/soft-intro-vae-pytorch

<a name="Visual Transformer"></a>

# Visual Transformer

**1. End-to-End Human Pose and Mesh Reconstruction with Transformers**

- Paper: https://arxiv.org/abs/2012.09760
- Code: https://github.com/microsoft/MeshTransformer

**2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition**

- Paper: https://arxiv.org/abs/2101.06184
- Code: https://github.com/tobyperrett/trx

**3. Kaleido-BERT：Vision-Language Pre-training on Fashion Domain**

- Paper: https://arxiv.org/abs/2103.16110
- Code: https://github.com/mczhuge/Kaleido-BERT

**4. HOTR: End-to-End Human-Object Interaction Detection with Transformers**

- Paper: https://arxiv.org/abs/2104.13682
- Code: https://github.com/kakaobrain/HOTR

**5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving**

- Paper: https://arxiv.org/abs/2104.09224
- Code: https://github.com/autonomousvision/transfuser

**6. Pose Recognition with Cascade Transformers**

- Paper: https://arxiv.org/abs/2104.06976

- Code: https://github.com/mlpc-ucsd/PRTR

**7. Variational Transformer Networks for Layout Generation**

- Paper: https://arxiv.org/abs/2104.02416
- Code: None

**8. LoFTR: Detector-Free Local Feature Matching with Transformers**

- Homepage: https://zju3dv.github.io/loftr/
- Paper: https://arxiv.org/abs/2104.00680
- Code: https://github.com/zju3dv/LoFTR

**9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers**

- Paper: https://arxiv.org/abs/2012.15840
- Code: https://github.com/fudan-zvg/SETR

**10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers**

- Paper: https://arxiv.org/abs/2103.16553
- Code: None

**11. Transformer Tracking**

- Paper: https://arxiv.org/abs/2103.15436
- Code: https://github.com/chenxin-dlut/TransT

**12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers**

- Paper(Oral):  https://arxiv.org/abs/2106.06560 
- Code: https://github.com/dingmyu/HR-NAS

**13. MIST: Multiple Instance Spatial Transformer**

- Paper: https://arxiv.org/abs/1811.10725
- Code: None

**14. Multimodal Motion Prediction with Stacked Transformers**

- Paper: https://arxiv.org/abs/2103.11624
- Code: https://decisionforce.github.io/mmTransformer

**15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning**

- Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning

- Code: https://github.com/amzn/image-to-recipe-transformers

**16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking**

- Paper(Oral): https://arxiv.org/abs/2103.11681

- Code: https://github.com/594422814/TransformerTrack

**17. Pre-Trained Image Processing Transformer**

- Paper:  https://arxiv.org/abs/2012.00364 
- Code: None

**18. End-to-End Video Instance Segmentation with Transformers**

- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR

**19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers**

- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr

**20. End-to-End Human Object Interaction Detection with HOI Transformer**

- Paper: https://arxiv.org/abs/2103.04503
- Code: https://github.com/bbepoch/HoiTransformer

**21. Transformer Interpretability Beyond Attention Visualization** 

- Paper: https://arxiv.org/abs/2012.09838
- Code: https://github.com/hila-chefer/Transformer-Explainability

**22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer**

- Paper: None
- Code: None

**23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity**

- Paper: None
- Code: None

**24. Line Segment Detection Using Transformers without Edges**

- Paper(Oral): https://arxiv.org/abs/2101.01909
- Code: None

**25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers**

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html
- Code: None

**26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation**

- Paper(Oral): https://arxiv.org/abs/2101.08833
- Code: https://github.com/dukebw/SSTVOS

**27. Facial Action Unit Detection With Transformers**

- Paper: None
- Code: None

**28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition**

- Paper: None
- Code: None

**29. Lesion-Aware Transformers for Diabetic Retinopathy Grading**

- Paper: None
- Code: None

**30. Topological Planning With Transformers for Vision-and-Language Navigation**

- Paper: https://arxiv.org/abs/2012.05292
- Code: None

**31. Adaptive Image Transformer for One-Shot Object Detection**

- Paper: None
- Code: None

**32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos**

- Paper: None
- Code: None

**33. Taming Transformers for High-Resolution Image Synthesis**

- Homepage: https://compvis.github.io/taming-transformers/
- Paper(Oral): https://arxiv.org/abs/2012.09841
- Code: https://github.com/CompVis/taming-transformers

**34. Self-Supervised Video Hashing via Bidirectional Transformers**

- Paper: None
- Code: None

**35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos**

- Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
- Code: None

**36. Gaussian Context Transformer**

- Paper: None
- Code: None

**37. General Multi-Label Image Classification With Transformers**

- Paper: https://arxiv.org/abs/2011.14027
- Code: None

**38. Bottleneck Transformers for Visual Recognition**

- Paper: https://arxiv.org/abs/2101.11605
- Code: None

**39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation**

- Paper(Oral): https://arxiv.org/abs/2011.13922
- Code: https://github.com/YicongHong/Recurrent-VLN-BERT

**40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling**

- Paper(Oral): https://arxiv.org/abs/2102.06183
- Code: https://github.com/jayleicn/ClipBERT

**41. Self-attention based Text Knowledge Mining for Text Detection**

- Paper: None
- Code: https://github.com/CVI-SZU/STKM

**42. SSAN: Separable Self-Attention Network for Video Representation Learning**

- Paper: None
- Code: None

**43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones**

- Paper(Oral): https://arxiv.org/abs/2103.12731

- Code: None

<a name="Regularization"></a>

# Regularization

**Regularizing Neural Networks via Adversarial Model Perturbation**

- Paper: https://arxiv.org/abs/2010.04925
- Code: https://github.com/hiyouga/AMP-Regularizer

<a name="SLAM"></a>

# SLAM

**Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation**

- Paper: https://arxiv.org/abs/2105.07593
- Code: None

**Generalizing to the Open World: Deep Visual Odometry with Online Adaptation**

- Paper: https://arxiv.org/abs/2103.15279
- Code: https://arxiv.org/abs/2103.15279

<a name="Long-Tailed"></a>

# 长尾分布(Long-Tailed)

**Adversarial Robustness under Long-Tailed Distribution**

- Paper(Oral): https://arxiv.org/abs/2104.02703
- Code: https://github.com/wutong16/Adversarial_Long-Tail 

**Distribution Alignment: A Unified Framework for Long-tail Visual Recognition**

- Paper: https://arxiv.org/abs/2103.16370
- Code: https://github.com/Megvii-BaseDetection/DisAlign

**Adaptive Class Suppression Loss for Long-Tail Object Detection**

- Paper: https://arxiv.org/abs/2104.00885
- Code: https://github.com/CASIA-IVA-Lab/ACSL

**Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification**

- Paper: https://arxiv.org/abs/2103.14267
- Code: None

<a name="DA"></a>

# 数据增广(Data Augmentation)

**Scale-aware Automatic Augmentation for Object Detection**

- Paper: https://arxiv.org/abs/2103.17220

- Code: https://github.com/Jia-Research-Lab/SA-AutoAug

<a name="Un/Self-Supervised"></a>

# 无监督/自监督(Un/Self-Supervised)

**Domain-Specific Suppression for Adaptive Object Detection**

- Paper: https://arxiv.org/abs/2105.03570
- Code: None

**A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning**

- Paper: https://arxiv.org/abs/2104.14558

- Code: https://github.com/facebookresearch/SlowFast

**Unsupervised Multi-Source Domain Adaptation for Person Re-Identification**

- Paper: https://arxiv.org/abs/2104.12961
- Code: None

**Self-supervised Video Representation Learning by Context and Motion Decoupling**

- Paper: https://arxiv.org/abs/2104.00862
- Code: None

**Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning**

- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE

**Spatially Consistent Representation Learning**

- Paper: https://arxiv.org/abs/2103.06122
- Code: None

**VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples**

- Paper: https://arxiv.org/abs/2103.05905
- Code: https://github.com/tinapan-pt/VideoMoCo

**Exploring Simple Siamese Representation Learning**

- Paper(Oral): https://arxiv.org/abs/2011.10566
- Code: None

**Dense Contrastive Learning for Self-Supervised Visual Pre-Training**

- Paper(Oral): https://arxiv.org/abs/2011.09157
- Code: https://github.com/WXinlong/DenseCL

<a name="Semi-Supervised"></a>

# 半监督学习(Semi-Supervised )

**Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework**

- 作者单位: 阿里巴巴

- Paper: https://arxiv.org/abs/2103.11402
- Code: None

**Adaptive Consistency Regularization for Semi-Supervised Transfer Learning**

- Paper: https://arxiv.org/abs/2103.02193
- Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning

<a name="Capsule-Network"></a>

# 胶囊网络(Capsule Network)

**Capsule Network is Not More Robust than Convolutional Network**

- Paper: https://arxiv.org/abs/2103.15459
- Code: None

<a name="Image-Classification"></a>

# 图像分类(Image Classification)

**Correlated Input-Dependent Label Noise in Large-Scale Image Classification**

- Paper(Oral): https://arxiv.org/abs/2105.10305
- Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet

<a name="Object-Detection"></a>

# 2D目标检测(Object Detection)

## 2D目标检测

**1. Scaled-YOLOv4: Scaling Cross Stage Partial Network**

- 作者单位: 中央研究院, 英特尔, 静宜大学
- Paper: https://arxiv.org/abs/2011.08036
- Code: https://github.com/WongKinYiu/ScaledYOLOv4
- 中文解读: [YOLOv4官方改进版来了！55.8% AP！速度最高达1774 FPS，Scaled-YOLOv4正式开源！](https://mp.weixin.qq.com/s/AcrJPNoAVhn8cGBUGK7ekA)

**2. You Only Look One-level Feature**

- 作者单位: 中科院, 国科大, 旷视科技
- Paper: https://arxiv.org/abs/2103.09460
- Code: https://github.com/megvii-model/YOLOF
- 中文解读: [CVPR 2021 | 没有FPN！中科院&旷视提出YOLOF：你只需看一层特征](https://mp.weixin.qq.com/s/EJqAG1gTVaP2icI6QL742A)

**3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals**

- 作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校
- Paper: https://arxiv.org/abs/2011.12450
- Code: https://github.com/PeizeSun/SparseR-CNN
- 中文解读: [目标检测新范式！港大同济伯克利提出Sparse R-CNN，代码刚刚开源！](https://mp.weixin.qq.com/s/P2Zgh1wTqf8L2976El5nfQ)

**4. End-to-End Object Detection with Fully Convolutional Network**

- 作者单位: 旷视科技, 西安交通大学
- Paper: https://arxiv.org/abs/2012.03544
- Code: https://github.com/Megvii-BaseDetection/DeFCN

**5. Dynamic Head: Unifying Object Detection Heads with Attentions**

- 作者单位: 微软
- Paper: https://arxiv.org/abs/2106.08322
- Code: https://github.com/microsoft/DynamicHead
- 中文解读: [60.6 AP！打破COCO记录！微软提出DyHead：将注意力与目标检测Heads统一](https://mp.weixin.qq.com/s/uYPUqVXwNau71VAYW3bYIA)

**6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection**

- 作者单位: 南京理工大学, Momenta, 南京大学, 清华大学
- Paper: https://arxiv.org/abs/2011.12885
- Code: https://github.com/implus/GFocalV2
- 中文解读：[CVPR 2021 | GFLV2：目标检测良心技术，无Cost涨点！](https://mp.weixin.qq.com/s/JB7k3NwXU-cDueg6w9mghQ)

**7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers**

- 作者单位: 华南理工大学, 腾讯微信AI
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
- 中文解读: [CVPR 2021 Oral | Transformer再发力！华南理工和微信提出UP-DETR：无监督预训练检测器](https://mp.weixin.qq.com/s/Hprp7B16SGFhVEKXfKiRBQ)

**8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators**

- 作者单位: 威斯康星大学, 谷歌

- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf
- Code: https://github.com/tensorflow/models/tree/master/research/object_detection

**9. Tracking Pedestrian Heads in Dense Crowd**

- 作者单位: 雷恩第一大学
- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html
- Code1: https://github.com/Sentient07/HeadHunter
- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T
- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/

**10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation**

- 作者单位: 香港科技大学, 华为诺亚
- Paper:  https://arxiv.org/abs/2105.12971 
- Code: None

**11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery**

- 作者单位: A*star, 四川大学,  南洋理工大学
- Paper: https://arxiv.org/abs/2105.12990
- Code: None

**12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection**

- 作者单位: 旷视科技
- Paper: https://arxiv.org/abs/2104.06936
- Code: None

**13. Multi-Scale Aligned Distillation for Low-Resolution Detection**

- 作者单位: 香港中文大学, Adobe研究院, 思谋科技
- Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
- Code: https://github.com/Jia-Research-Lab/MSAD

**14. Adaptive Class Suppression Loss for Long-Tail Object Detection**

- 作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise

- Paper: https://arxiv.org/abs/2104.00885
- Code: https://github.com/CASIA-IVA-Lab/ACSL

**15. VarifocalNet: An IoU-aware Dense Object Detector**

- 作者单位: 昆士兰科技大学, 昆士兰大学
- Paper(Oral): https://arxiv.org/abs/2008.13367
- Code: https://github.com/hyz-xmaster/VarifocalNet

**16. OTA: Optimal Transport Assignment for Object Detection**

- 作者单位: 早稻田大学, 旷视科技

- Paper: https://arxiv.org/abs/2103.14259
- Code: https://github.com/Megvii-BaseDetection/OTA

**17. Distilling Object Detectors via Decoupled Features**

- 作者单位: 华为诺亚, 悉尼大学
- Paper: https://arxiv.org/abs/2103.14475
- Code: https://github.com/ggjy/DeFeat.pytorch

**18. Robust and Accurate Object Detection via Adversarial Learning**

- 作者单位: 谷歌, UCLA, UCSC

- Paper: https://arxiv.org/abs/2103.13886

- Code: None

**19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection**

- 作者单位: 北京大学, Anyvision, 石溪大学
- Paper: https://arxiv.org/abs/2103.04507
- Code: https://github.com/VDIGPKU/OPANAS

**20. Multiple Instance Active Learning for Object Detection**

- 作者单位: 国科大, 华为诺亚, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf
- Code: https://github.com/yuantn/MI-AOD

**21. Towards Open World Object Detection**

- 作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学
- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD

**22. RankDetNet: Delving Into Ranking Constraints for Object Detection**

- 作者单位: 赛灵思
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_RankDetNet_Delving_Into_Ranking_Constraints_for_Object_Detection_CVPR_2021_paper.html
- Code: None

## 旋转目标检测

**23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection**

- 作者单位: 上海交通大学, 国科大
- Paper: https://arxiv.org/abs/2011.09670
- Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow
- Code2: https://github.com/yangxue0827/RotationDetection 

**24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection**

- 作者单位: 武汉大学

- Paper: https://arxiv.org/abs/2103.07733
- Code: https://github.com/csuhan/ReDet

**25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection**

- 作者单位: 国科大, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Beyond_Bounding-Box_Convex-Hull_Feature_Adaptation_for_Oriented_and_Densely_Packed_CVPR_2021_paper.html
- Code: https://github.com/SDL-GuoZonghao/BeyondBoundingBox

## Few-Shot目标检测

**26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss**

- 作者单位: 复旦大学, 同济大学, 浙江大学

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html
- Code: None

**27. Adaptive Image Transformer for One-Shot Object Detection**

- 作者单位: 中央研究院, 台湾AI Labs 
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Adaptive_Image_Transformer_for_One-Shot_Object_Detection_CVPR_2021_paper.html
- Code: None

**28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection**

- 作者单位: 北京大学, 北邮
- Paper: https://arxiv.org/abs/2103.17115
- Code: https://github.com/hzhupku/DCNet 

**29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection**

- 作者单位: 卡内基梅隆大学(CMU)

- Paper: https://arxiv.org/abs/2103.01903
- Code: None

**30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding**

- 作者单位: 南加利福尼亚大学, 旷视科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sun_FSCE_Few-Shot_Object_Detection_via_Contrastive_Proposal_Encoding_CVPR_2021_paper.html
- Code:  https://github.com/MegviiDetection/FSCE 

**31. Hallucination Improves Few-Shot Object Detection**

- 作者单位: 伊利诺伊大学厄巴纳-香槟分校
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Hallucination_Improves_Few-Shot_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/pppplin/HallucFsDet

**32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment**

- 作者单位: 新加坡国立大学, SIMTech
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Few-Shot_Object_Detection_via_Classification_Refinement_and_Distractor_Retreatment_CVPR_2021_paper.html
- Code: None

**33. Generalized Few-Shot Object Detection Without Forgetting**

- 作者单位: 旷视科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Fan_Generalized_Few-Shot_Object_Detection_Without_Forgetting_CVPR_2021_paper.html
- Code: None

**34. Transformation Invariant Few-Shot Object Detection**

- 作者单位: 华为诺亚方舟实验室

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html
- Code: None

**35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation**

- 作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Khandelwal_UniT_Unified_Knowledge_Transfer_for_Any-Shot_Object_Detection_and_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/ubc-vision/UniT

**36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection**

- 作者单位: 国科大, 厦门大学, 鹏城实验室
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Beyond_Max-Margin_Class_Margin_Equilibrium_for_Few-Shot_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/Bohao-Lee/CME

## 半监督目标检测

 **37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]**

- 作者单位: 旷视科技, 复旦大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Points_As_Queries_Weakly_Semi-Supervised_Object_Detection_by_Points_CVPR_2021_paper.html
- Code: None

**38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection**

- 作者单位: 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Data-Uncertainty_Guided_Multi-Phase_Learning_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html
- Code: None

**39. Positive-Unlabeled Data Purification in the Wild for Object Detection**

- 作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html
- Code: None

**40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection**

- 作者单位: 阿里巴巴, 香港理工大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Interactive_Self-Training_With_Mean_Teachers_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html
- Code: None

**41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework**

- 作者单位: 阿里巴巴
- Paper: https://arxiv.org/abs/2103.11402
- Code: None

**42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection**

- 作者单位:  卡内基梅隆大学(CMU), 亚马逊
- Homepage: https://yihet.com/humble-teacher
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tang_Humble_Teachers_Teach_Better_Students_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/lryta/HumbleTeacher

**43. Interpolation-Based Semi-Supervised Learning for Object Detection**

- 作者单位: 首尔大学, 阿尔托大学等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Jeong_Interpolation-Based_Semi-Supervised_Learning_for_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/soo89/ISD-SSD

# 域自适应目标检测

**44. Domain-Specific Suppression for Adaptive Object Detection**

- 作者单位: 中科院, 寒武纪, 国科大
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Domain-Specific_Suppression_for_Adaptive_Object_Detection_CVPR_2021_paper.html
- Code: None

**45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection**

- 作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰
- Paper: https://arxiv.org/abs/2103.04224
- Code: None

**46. Unbiased Mean Teacher for Cross-Domain Object Detection**

- 作者单位: 电子科技大学, ETH Zurich
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Deng_Unbiased_Mean_Teacher_for_Cross-Domain_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/kinredon/umt

**47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors**

- 作者单位: 香港大学, 厦门大学, Deepwise AI Lab
- Paper: https://arxiv.org/abs/2103.13757
- Code: None 

## 自监督目标检测

**48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge**

- 作者单位: 弗莱堡大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Valverde_There_Is_More_Than_Meets_the_Eye_Self-Supervised_Multi-Object_Detection_CVPR_2021_paper.html
- Code: http://rl.uni-freiburg.de/research/multimodal-distill

**49. Instance Localization for Self-supervised Detection Pretraining**

- 作者单位: 香港中文大学, 微软亚洲研究院
- Paper: https://arxiv.org/abs/2102.08318
- Code: https://github.com/limbo0000/InstanceLoc

## 弱监督目标检测

**50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection**

- 作者单位: 北航, 鹏城实验室, 商汤科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Informative_and_Consistent_Correspondence_Mining_for_Cross-Domain_Weakly_Supervised_Object_CVPR_2021_paper.html
- Code: None

**51. DAP: Detection-Aware Pre-training with Weak Supervision** 

- 作者单位: UIUC, 微软
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhong_DAP_Detection-Aware_Pre-Training_With_Weak_Supervision_CVPR_2021_paper.html
- Code: None

## 其他

**52. Open-Vocabulary Object Detection Using Captions**

- 作者单位：Snap, 哥伦比亚大学

- Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html
- Code: https://github.com/alirezazareian/ovr-cnn

**53. Depth From Camera Motion and Object Detection**

- 作者单位:  密歇根大学, SIAI

- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD

**54. Unsupervised Object Detection With LIDAR Clues**

- 作者单位: 商汤科技, 国科大, 中科大
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tian_Unsupervised_Object_Detection_With_LIDAR_Clues_CVPR_2021_paper.html
- Code: None

**55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs**

- 作者单位: 国科大, 北理, 中科院, 商汤科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Bu_GAIA_A_Transfer_Learning_System_of_Object_Detection_That_Fits_CVPR_2021_paper.html
- Code: https://github.com/GAIA-vision/GAIA-det

**56. General Instance Distillation for Object Detection**

- 作者单位: 旷视科技, 北航
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dai_General_Instance_Distillation_for_Object_Detection_CVPR_2021_paper.html
- Code: None

**57. AQD: Towards Accurate Quantized Object Detection**

- 作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_AQD_Towards_Accurate_Quantized_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/aim-uofa/model-quantization

**58. Scale-Aware Automatic Augmentation for Object Detection**

- 作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scale-Aware_Automatic_Augmentation_for_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/Jia-Research-Lab/SA-AutoAug

**59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection**

- 作者单位: 同济大学, 商汤科技, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tan_Equalization_Loss_v2_A_New_Gradient_Balance_Approach_for_Long-Tailed_CVPR_2021_paper.html
- Code: https://github.com/tztztztztz/eqlv2

**60. Class-Aware Robust Adversarial Training for Object Detection**

- 作者单位: 哥伦比亚大学,  中央研究院 
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Class-Aware_Robust_Adversarial_Training_for_Object_Detection_CVPR_2021_paper.html
- Code: None

**61. Improved Handling of Motion Blur in Online Object Detection**

- 作者单位: 伦敦大学学院
- Homepage: http://visual.cs.ucl.ac.uk/pubs/handlingMotionBlur/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sayed_Improved_Handling_of_Motion_Blur_in_Online_Object_Detection_CVPR_2021_paper.html
- Code: None

**62. Multiple Instance Active Learning for Object Detection**

- 作者单位: 国科大, 华为诺亚
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/yuantn/MI-AOD

**63. Neural Auto-Exposure for High-Dynamic Range Object Detection**

- 作者单位: Algolux, 普林斯顿大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html
- Code: None

**64. Generalizable Pedestrian Detection: The Elephant in the Room**

- 作者单位: IIAI, 阿尔托大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hasan_Generalizable_Pedestrian_Detection_The_Elephant_in_the_Room_CVPR_2021_paper.html
- Code: https://github.com/hasanirtiza/Pedestron

**65. Neural Auto-Exposure for High-Dynamic Range Object Detection**

- 作者单位: Algolux, 普林斯顿大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html
- Code: None

<a name="Object-Tracking"></a>

# 单/多目标跟踪(Object Tracking)

## 单目标跟踪

**LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search**

- Paper: https://arxiv.org/abs/2104.14545

- Code: https://github.com/researchmm/LightTrack

**Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark**

- Homepage: https://sites.google.com/view/langtrackbenchmark/

- Paper: https://arxiv.org/abs/2103.16746
- Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
- Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang 

**IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking**

- Paper: https://arxiv.org/abs/2103.14938
- Code: https://github.com/VISION-SJTU/IoUattack

**Graph Attention Tracking**

- Paper: https://arxiv.org/abs/2011.11204
- Code: https://github.com/ohhhyeahhh/SiamGAT

**Rotation Equivariant Siamese Networks for Tracking**

- Paper: https://arxiv.org/abs/2012.13078
- Code: None

**Track to Detect and Segment: An Online Multi-Object Tracker**

- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: None
- Code: None

**Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking**

- Paper(Oral): https://arxiv.org/abs/2103.11681

- Code: https://github.com/594422814/TransformerTrack

**Transformer Tracking**

- Paper: https://arxiv.org/abs/2103.15436
- Code: https://github.com/chenxin-dlut/TransT

## 多目标跟踪

**Tracking Pedestrian Heads in Dense Crowd**

- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html
- Code1: https://github.com/Sentient07/HeadHunter
- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T
- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/

**Multiple Object Tracking with Correlation Learning**

- Paper: https://arxiv.org/abs/2104.03541
- Code: None

**Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking**

- Paper: https://arxiv.org/abs/2012.02337
- Code: None

**Learning a Proposal Classifier for Multiple Object Tracking**

- Paper: https://arxiv.org/abs/2103.07889
- Code: https://github.com/daip13/LPC_MOT.git

**Track to Detect and Segment: An Online Multi-Object Tracker**

- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: https://arxiv.org/abs/2103.08808
- Code: https://github.com/JialianW/TraDeS

<a name="Semantic-Segmentation"></a>

# 语义分割(Semantic Segmentation)

**1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation**

- 作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学

- Homepage: https://nirkin.com/hyperseg/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf

- Code: https://github.com/YuvalNirkin/hyperseg

**2. Rethinking BiSeNet For Real-time Semantic Segmentation**

- 作者单位: 美团

- Paper: https://arxiv.org/abs/2104.13188

- Code: https://github.com/MichaelFan01/STDC-Seg

**3. Progressive Semantic Segmentation**

- 作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学
- Paper: https://arxiv.org/abs/2104.03778
- Code: https://github.com/VinAIResearch/MagNet

**4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers**

- 作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI
- Homepage: https://fudan-zvg.github.io/SETR
- Paper: https://arxiv.org/abs/2012.15840
- Code: https://github.com/fudan-zvg/SETR

**5. Capturing Omni-Range Context for Omnidirectional Segmentation**

- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为
- Paper: https://arxiv.org/abs/2103.05687
- Code: None

**6. Learning Statistical Texture for Semantic Segmentation**

- 作者单位: 北航, 商汤科技
- Paper: https://arxiv.org/abs/2103.04133
- Code: None

**7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation**

- 作者单位: 高通AI研究院
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Borse_InverseForm_A_Loss_Function_for_Structured_Boundary-Aware_Segmentation_CVPR_2021_paper.html
- Code: None

**8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation**

- 作者单位: Joyy Inc, 快手, 北航等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_DCNAS_Densely_Connected_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.html
- Code: None

## 弱监督语义分割

**9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation**

- 作者单位: 延世大学, 成均馆大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lee_Railroad_Is_Not_a_Train_Saliency_As_Pseudo-Pixel_Supervision_for_CVPR_2021_paper.html
- Code: https://github.com/halbielee/EPS

**10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation**

- 作者单位: 延世大学
- Homepage:  https://cvlab.yonsei.ac.kr/projects/BANA/ 
- Paper: https://arxiv.org/abs/2104.00905
- Code: None

**11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation**

- 作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学

- Paper: https://arxiv.org/abs/2103.14581
- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom

**12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation**

- 作者单位: 北京理工大学, 美团
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_Embedded_Discriminative_Attention_Mechanism_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/allenwu97/EDAM

**13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation**

- 作者单位: 首尔大学
- Paper: https://arxiv.org/abs/2103.08907
- Code: https://github.com/jbeomlee93/BBAM

## 半监督语义分割

**14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision**

- 作者单位: 北京大学, 微软亚洲研究院
- Paper: https://arxiv.org/abs/2106.01226
- Code: https://github.com/charlesCXK/TorchSemiSeg

**15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation**

- 作者单位: 华为, 大连理工大学, 北京大学
- Paper: https://arxiv.org/abs/2103.04705
- Code: None

**16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency**

- 作者单位: 香港中文大学, 思谋科技, 牛津大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lai_Semi-Supervised_Semantic_Segmentation_With_Directional_Context-Aware_Consistency_CVPR_2021_paper.html
- Code: None

**17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization**

- 作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Semantic_Segmentation_With_Generative_Models_Semi-Supervised_Learning_and_Strong_Out-of-Domain_CVPR_2021_paper.html
- Code: https://nv-tlabs.github.io/semanticGAN/

**18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation**

- 作者单位: ETH Zurich, 伯恩大学, 鲁汶大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hoyer_Three_Ways_To_Improve_Semantic_Segmentation_With_Self-Supervised_Depth_Estimation_CVPR_2021_paper.html
- Code: https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth

## 域自适应语义分割

**19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation**

- 作者单位: ETH Zurich, 鲁汶大学, 电子科技大学

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html
- Code: None

**20. Source-Free Domain Adaptation for Semantic Segmentation**

- 作者单位: 华东师范大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Source-Free_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2021_paper.html
- Code: None

**21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation**

- 作者单位: Idiap Research Institute, EPFL, 日内瓦大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/S_Uncertainty_Reduction_for_Model_Adaptation_in_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://git.io/JthPp

**22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation**

- 作者单位: 达姆施塔特工业大学, hessian.AI
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Araslanov_Self-Supervised_Augmentation_Consistency_for_Adapting_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/visinf/da-sac

**23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening**

- 作者单位: LG AI研究院, KAIST等
- Paper: https://arxiv.org/abs/2103.15597
- Code: https://github.com/shachoi/RobustNet

**24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization**

- 作者单位: 香港大学, 深睿医疗
- Paper: https://arxiv.org/abs/2103.13041
- Code: None

**25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation**

- 作者单位: 香港城市大学, 百度
- Paper: https://arxiv.org/abs/2103.05254
- Code: https://github.com/cyang-cityu/MetaCorrection

**26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation**

- 作者单位: 华为云, 华为诺亚, 大连理工大学
- Paper: https://arxiv.org/abs/2103.04717
- Code: None

**27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation**

- 作者单位: 中国科学技术大学, 微软亚洲研究院
- Paper: https://arxiv.org/abs/2101.10979
- Code: https://github.com/microsoft/ProDA

**28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation**

- 作者单位: 南卡罗来纳大学, 天远视科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_DANNet_A_One-Stage_Domain_Adaptation_Network_for_Unsupervised_Nighttime_Semantic_CVPR_2021_paper.html
- Code: https://github.com/W-zx-Y/DANNet

## Few-Shot语义分割

**29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation**

- 作者单位: MBZUAI, IIAI, 哈工大
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Xie_Scale-Aware_Graph_Neural_Network_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html
- Code: None

**30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation**

- 作者单位: 国科大, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Anti-Aliasing_Semantic_Reconstruction_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/Bibkiller/ASR 

## 无监督语义分割

**31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering**

- 作者单位: UT-Austin, 康奈尔大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cho_PiCIE_Unsupervised_Semantic_Segmentation_Using_Invariance_and_Equivariance_in_Clustering_CVPR_2021_paper.html
- Code: https:// github.com/janghyuncho/PiCIE

## 视频语义分割

**32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild**

- 作者单位: 浙江大学, 百度, 悉尼科技大学
- Homepage: https://www.vspwdataset.com/
- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
- GitHub: https://github.com/sssdddwww2/vspw_dataset_download

## 其它

**33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations**

- 作者单位: 帕多瓦大学

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html
- Code: https://lttm.dei.unipd.it/paper_data/SDR/

**34. Exploit Visual Dependency Relations for Semantic Segmentation**

- 作者单位: 伊利诺伊大学芝加哥分校
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Exploit_Visual_Dependency_Relations_for_Semantic_Segmentation_CVPR_2021_paper.html
- Code: None

**35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs**

- 作者单位: Institute for Infocomm Research, 新加坡国立大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.html
- Code: None

**36. PLOP: Learning without Forgetting for Continual Semantic Segmentation**

- 作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai 
- Paper: https://arxiv.org/abs/2011.11390
- Code: https://github.com/arthurdouillard/CVPR2021_PLOP

**37. 3D-to-2D Distillation for Indoor Scene Parsing**

- 作者单位: 香港中文大学, 香港大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_3D-to-2D_Distillation_for_Indoor_Scene_Parsing_CVPR_2021_paper.html
- Code: None

**38. Bidirectional Projection Network for Cross Dimension Scene Understanding**

- 作者单位: 香港中文大学, 牛津大学等
- Paper(Oral): https://arxiv.org/abs/2103.14326
- Code: https://github.com/wbhu/BPNet

**39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation**

- 作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/lxtGH/PFSegNets

<a name="Instance-Segmentation"></a>

# 实例分割(Instance Segmentation)

**DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation**

- Paper: https://arxiv.org/abs/2011.09876
- Code: https://github.com/aliyun/DCT-Mask

**Incremental Few-Shot Instance Segmentation**

- Paper: https://arxiv.org/abs/2105.05312
- Code: https://github.com/danganea/iMTFA

**A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation**

- Paper: https://arxiv.org/abs/2105.03186
- Code: None

**RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features**

- Paper: https://arxiv.org/abs/2104.08569
- Code: https://github.com/zhanggang001/RefineMask/

**Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation**

- Paper: https://arxiv.org/abs/2104.05239
- Code:  https://github.com/tinyalpha/BPR 

**Multi-Scale Aligned Distillation for Low-Resolution Detection**

- Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf

- Code: https://github.com/Jia-Research-Lab/MSAD

**Boundary IoU: Improving Object-Centric Image Segmentation Evaluation**

- Homepage: https://bowenc0221.github.io/boundary-iou/
- Paper: https://arxiv.org/abs/2103.16562

- Code: https://github.com/bowenc0221/boundary-iou-api

**Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers**

- Paper: https://arxiv.org/abs/2103.12340

- Code: https://github.com/lkeab/BCNet 

**Zero-shot instance segmentation（Not Sure）**

- Paper: None
- Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395

## 视频实例分割

**STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation**

- Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
- Code: https://github.com/MinghanLi/STMask

**End-to-End Video Instance Segmentation with Transformers**

- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR

<a name="Panoptic-Segmentation"></a>

# 全景分割(Panoptic Segmentation)

**ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2012.05258
- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

**Part-aware Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2106.06351
- Code: https://github.com/tue-mps/panoptic_parts
- Dataset: https://github.com/tue-mps/panoptic_parts

**Exemplar-Based Open-Set Panoptic Segmentation Network**

- Homepage: https://cv.snu.ac.kr/research/EOPSN/
- Paper: https://arxiv.org/abs/2105.08336
- Code: https://github.com/jd730/EOPSN

**MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers**

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html
- Code: None

**Panoptic Segmentation Forecasting**

- Paper: https://arxiv.org/abs/2104.03962
- Code: https://github.com/nianticlabs/panoptic-forecasting

**Fully Convolutional Networks for Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2012.00720

- Code: https://github.com/yanwei-li/PanopticFCN

**Cross-View Regularization for Domain Adaptive Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2103.02584
- Code: None

<a name="Medical-Image-Segmentation"></a>

# 医学图像分割

**1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling**

- 作者单位: 腾讯天衍实验室, 北京同仁医院
- Paper(Best Paper Candidate): https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Learning_Calibrated_Medical_Image_Segmentation_via_Multi-Rater_Agreement_Modeling_CVPR_2021_paper.html
- Code: https://github.com/jiwei0921/MRNet/

**2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation**

- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Reiss_Every_Annotation_Counts_Multi-Label_Deep_Supervision_for_Medical_Image_Segmentation_CVPR_2021_paper.html
- Code: None

**3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space**

- 作者单位: 香港中文大学, 香港理工大学
- Paper: https://arxiv.org/abs/2103.06030
- Code: https://github.com/liuquande/FedDG-ELCFS

**4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation**

- 作者单位: 约翰斯·霍普金斯大大学, NVIDIA
- Paper(Oral): https://arxiv.org/abs/2103.15954
- Code: None

**5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images**

- 作者单位: 斯坦福大学

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html
- Code: None

<a name="VOS"></a>

# 视频目标分割(Video-Object-Segmentation)

**Learning Position and Target Consistency for Memory-based Video Object Segmentation**

- Paper: https://arxiv.org/abs/2104.04329
- Code: None

**SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation**

- Paper(Oral): https://arxiv.org/abs/2101.08833
- Code: https://github.com/dukebw/SSTVOS

<a name="IVOS"></a>

# 交互式视频目标分割(Interactive-Video-Object-Segmentation)

**Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion**

- Homepage: https://hkchengrex.github.io/MiVOS/

- Paper: https://arxiv.org/abs/2103.07941

- Code: https://github.com/hkchengrex/MiVOS
- Demo: https://hkchengrex.github.io/MiVOS/video.html#partb

**Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild**

- Paper: https://arxiv.org/abs/2103.10391

- Code: https://github.com/svip-lab/IVOS-W

<a name="Saliency-Detection"></a>

# 显著性检测(Saliency Detection)

**Uncertainty-aware Joint Salient Object and Camouflaged Object Detection**

- Paper: https://arxiv.org/abs/2104.02628

- Code: https://github.com/JingZhang617/Joint_COD_SOD

**Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion**

- Paper(Oral): https://arxiv.org/abs/2103.11832
- Code: https://github.com/sunpeng1996/DSA2F

<a name="Camouflaged-Object-Detection"></a>

# 伪装物体检测(Camouflaged Object Detection)

**Uncertainty-aware Joint Salient Object and Camouflaged Object Detection**

- Paper: https://arxiv.org/abs/2104.02628

- Code: https://github.com/JingZhang617/Joint_COD_SOD

<a name="CoSOD"></a>

# 协同显著性检测(Co-Salient Object Detection)

**Group Collaborative Learning for Co-Salient Object Detection**

- Paper: https://arxiv.org/abs/2104.01108
- Code: https://github.com/fanq15/GCoNet

<a name="Matting"></a>

# 协同显著性检测(Image Matting)

**Semantic Image Matting**

- Paper: https://arxiv.org/abs/2104.08201
- Code: https://github.com/nowsyn/SIM
- Dataset: https://github.com/nowsyn/SIM

<a name="Re-ID"></a>

# 行人重识别(Person Re-identification)

**Generalizable Person Re-identification with Relevance-aware Mixture of Experts**

- Paper: https://arxiv.org/abs/2105.09156
- Code: None

**Unsupervised Multi-Source Domain Adaptation for Person Re-Identification**

- Paper: https://arxiv.org/abs/2104.12961
- Code: None

**Combined Depth Space based Architecture Search For Person Re-identification**

- Paper: https://arxiv.org/abs/2104.04163
- Code: None

<a name="Person-Search"></a>

# 行人搜索(Person Search)

**Anchor-Free Person Search**

- Paper: https://arxiv.org/abs/2103.11617
- Code: https://github.com/daodaofr/AlignPS
- Interpretation: [首个无需锚框（Anchor-Free）的行人搜索框架 | CVPR 2021](https://mp.weixin.qq.com/s/iqJkgp0JBanmeBPyHUkb-A)

<a name="Video-Understanding"></a>

# 视频理解/行为识别(Video Understanding)

**Temporal-Relational CrossTransformers for Few-Shot Action Recognition**

- Paper: https://arxiv.org/abs/2101.06184
- Code: https://github.com/tobyperrett/trx

**FrameExit: Conditional Early Exiting for Efficient Video Recognition**

- Paper(Oral): https://arxiv.org/abs/2104.13400
- Code: None

**No frame left behind: Full Video Action Recognition**

- Paper: https://arxiv.org/abs/2103.15395
- Code: None

**Learning Salient Boundary Feature for Anchor-free Temporal Action Localization**

- Paper: https://arxiv.org/abs/2103.13137
- Code: None

**Temporal Context Aggregation Network for Temporal Action Proposal Refinement**

- Paper: https://arxiv.org/abs/2103.13141
- Code: None
- Interpretation: [CVPR 2021 | TCANet：最强时序动作提名修正网络](https://mp.weixin.qq.com/s/UOWMfpTljkyZznHtpkQBhA)

**ACTION-Net: Multipath Excitation for Action Recognition**

- Paper: https://arxiv.org/abs/2103.07372
- Code: https://github.com/V-Sense/ACTION-Net

**Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning**

- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE

**TDN: Temporal Difference Networks for Efficient Action Recognition**

- Paper: https://arxiv.org/abs/2012.10071
- Code: https://github.com/MCG-NJU/TDN

<a name="Face-Recognition"></a>

# 人脸识别(Face Recognition)

**A 3D GAN for Improved Large-pose Facial Recognition**

- Paper: https://arxiv.org/abs/2012.10545
- Code: None

**MagFace: A Universal Representation for Face Recognition and Quality Assessment**

- Paper(Oral): https://arxiv.org/abs/2103.06627
- Code: https://github.com/IrvingMeng/MagFace

**WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition**

- Homepage: https://www.face-benchmark.org/
- Paper: https://arxiv.org/abs/2103.04098 
- Dataset: https://www.face-benchmark.org/

**When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework**

- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace

<a name="Face-Detection"></a>

# 人脸检测(Face Detection)

**HLA-Face: Joint High-Low Adaptation for Low Light Face Detection**

- Homepage: https://daooshee.github.io/HLA-Face-Website/
- Paper: https://arxiv.org/abs/2104.01984
- Code: https://github.com/daooshee/HLA-Face-Code

**CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement**

- Paper: https://arxiv.org/abs/2103.07017
- Code: None

<a name="Face-Anti-Spoofing"></a>

# 人脸活体检测(Face Anti-Spoofing)

**Cross Modal Focal Loss for RGBD Face Anti-Spoofing**

- Paper: https://arxiv.org/abs/2103.00948
- Code: None

<a name="Deepfake-Detection"></a>

# Deepfake检测(Deepfake Detection)

**Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain**

- Paper：https://arxiv.org/abs/2103.01856
- Code: None

**Multi-attentional Deepfake Detection**

- Paper：https://arxiv.org/abs/2103.02406
- Code: None

<a name="Age-Estimation"></a>

# 人脸年龄估计(Age Estimation)

**Continuous Face Aging via Self-estimated Residual Age Embedding**

- Paper: https://arxiv.org/abs/2105.00020
- Code: None

**PML: Progressive Margin Loss for Long-tailed Age Classification**

- Paper: https://arxiv.org/abs/2103.02140
- Code: None

<a name="FER"></a>

# 人脸表情识别(Facial Expression Recognition)

**Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition**

- Paper: https://arxiv.org/abs/2103.13372
- Code: None

<a name="Deepfakes"></a>

# Deepfakes

**MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes**

- Paper: https://arxiv.org/abs/2103.14211
- Code: None

<a name="Human-Parsing"></a>

# 人体解析(Human Parsing)

**Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing**

- Paper: https://arxiv.org/abs/2103.04570
- Code: https://github.com/tfzhou/MG-HumanParsing

<a name="Human-Pose-Estimation"></a>

# 2D/3D人体姿态估计(2D/3D Human Pose Estimation)

## 2D 人体姿态估计

**ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search**

- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None

**When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks**

- Paper: https://arxiv.org/abs/2105.06152
- Code: None

**Pose Recognition with Cascade Transformers**

- Paper: https://arxiv.org/abs/2104.06976

- Code: https://github.com/mlpc-ucsd/PRTR

**DCPose: Deep Dual Consecutive Network for Human Pose Estimation**

-  Paper: https://arxiv.org/abs/2103.07254
- Code: https://github.com/Pose-Group/DCPose 

## 3D 人体姿态估计

**End-to-End Human Pose and Mesh Reconstruction with Transformers**

- Paper: https://arxiv.org/abs/2012.09760
- Code: https://github.com/microsoft/MeshTransformer

**PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation**

- Paper(Oral): https://arxiv.org/abs/2105.02465

- Code: https://github.com/jfzhang95/PoseAug

**Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration**

- Paper: https://arxiv.org/abs/2103.02845
- Code: https://github.com/SeanChenxy/HandMesh

**Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks**

- Paper: https://arxiv.org/abs/2104.01797
- https://github.com/3dpose/3D-Multi-Person-Pose

**HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation**

- Homepage: https://jeffli.site/HybrIK/ 
- Paper: https://arxiv.org/abs/2011.14672
- Code: https://github.com/Jeff-sjtu/HybrIK

<a name="Animal-Pose-Estimation"></a>

# 动物姿态估计(Animal Pose Estimation)

**From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation**

- Paper: https://arxiv.org/abs/2103.14843
- Code: None

<a name="Hand-Pose-Estimation"></a>

# 手部姿态估计(Hand Pose Estimation)

**Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time**

- Homepage: https://stevenlsw.github.io/Semi-Hand-Object/
- Paper: https://arxiv.org/abs/2106.05266
- Code: https://github.com/stevenlsw/Semi-Hand-Object

<a name="Human-Volumetric-Capture"></a>

# Human Volumetric Capture

**POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture**

- Homepage: http://www.liuyebin.com/posefusion/posefusion.html

- Paper(Oral): https://arxiv.org/abs/2103.15331
- Code: None

<a name="Scene-Text-Recognition"></a>

# 场景文本检测(Scene Text Detection)

**Fourier Contour Embedding for Arbitrary-Shaped Text Detection**

- Paper: https://arxiv.org/abs/2104.10442
- Code: None

<a name="Scene-Text-Recognition"></a>

# 场景文本识别(Scene Text Recognition)

**Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition**

- Paper: https://arxiv.org/abs/2103.06495
- Code: https://github.com/FangShancheng/ABINet

<a name="Image-Compression"></a>

# 图像压缩

**Checkerboard Context Model for Efficient Learned Image Compression**

- Paper: https://arxiv.org/abs/2103.15306
- Code: None

**Slimmable Compressive Autoencoders for Practical Neural Image Compression**

- Paper: https://arxiv.org/abs/2103.15726
- Code: None

**Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton**

- Paper: https://arxiv.org/abs/2103.15368
- Code: None

<a name="Model-Compression"></a>

# 模型压缩/剪枝/量化

**Teachers Do More Than Teach: Compressing Image-to-Image Models**

- Paper: https://arxiv.org/abs/2103.03467
- Code: https://github.com/snap-research/CAT

## 模型剪枝

**Dynamic Slimmable Network**

- Paper: https://arxiv.org/abs/2103.13258
- Code: https://github.com/changlin31/DS-Net

## 模型量化

**Network Quantization with Element-wise Gradient Scaling**

- Paper: https://arxiv.org/abs/2104.00903
- Code: None

**Zero-shot Adversarial Quantization**

- Paper(Oral): https://arxiv.org/abs/2103.15263
- Code: https://git.io/Jqc0y

**Learnable Companding Quantization for Accurate Low-bit Neural Networks**

- Paper: https://arxiv.org/abs/2103.07156
- Code: None

<a name="KD"></a>

# 知识蒸馏(Knowledge Distillation)

**Distilling Knowledge via Knowledge Review**

- Paper: https://arxiv.org/abs/2104.09044
- Code: https://github.com/Jia-Research-Lab/ReviewKD

**Distilling Object Detectors via Decoupled Features**

- Paper: https://arxiv.org/abs/2103.14475
- Code: https://github.com/ggjy/DeFeat.pytorch

<a name="Super-Resolution"></a>

# 超分辨率(Super-Resolution)

**Image Super-Resolution with Non-Local Sparse Attention**

- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Mei_Image_Super-Resolution_With_Non-Local_Sparse_Attention_CVPR_2021_paper.pdf
- Code: https://github.com/HarukiYqM/Non-Local-Sparse-Attention

**Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline**

- Homepage: http://mepro.bjtu.edu.cn/resource.html
- Paper: https://arxiv.org/abs/2104.06174
- Code: None

**ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic**

- Paper: https://arxiv.org/abs/2103.04039
- Code: https://github.com/Xiangtaokong/ClassSR

**AdderSR: Towards Energy Efficient Image Super-Resolution**

- Paper: https://arxiv.org/abs/2009.08891
- Code: None

<a name="Dehazing"></a>

# 去雾(Dehazing)

**Contrastive Learning for Compact Single Image Dehazing**

- Paper: https://arxiv.org/abs/2104.09367
- Code: https://github.com/GlassyWu/AECR-Net

## 视频超分辨率

**Temporal Modulation Network for Controllable Space-Time Video Super-Resolution**

- Paper: None
- Code: https://github.com/CS-GangXu/TMNet

<a name="Image-Restoration"></a>

# 图像恢复(Image Restoration)

**Multi-Stage Progressive Image Restoration**

- Paper: https://arxiv.org/abs/2102.02808
- Code: https://github.com/swz30/MPRNet

<a name="Image-Inpainting"></a>

# 图像补全(Image Inpainting)

**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**

- Paper: https://arxiv.org/abs/2105.02201
- Code: https://github.com/KumapowerLIU/PD-GAN

**TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations**

- Homepage: https://yzhouas.github.io/projects/TransFill/index.html
- Paper: https://arxiv.org/abs/2103.15982
- Code: None

<a name="Image-Editing"></a>

# 图像编辑(Image Editing)

**StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**

- Paper: https://arxiv.org/abs/2104.14754
- Code: https://github.com/naver-ai/StyleMapGAN
- Demo Video: https://youtu.be/qCapNyRA_Ng

**High-Fidelity and Arbitrary Face Editing**

- Paper: https://arxiv.org/abs/2103.15814
- Code: None

**Anycost GANs for Interactive Image Synthesis and Editing**

- Paper: https://arxiv.org/abs/2103.03243
- Code: https://github.com/mit-han-lab/anycost-gan

**PISE: Person Image Synthesis and Editing with Decoupled GAN**

- Paper: https://arxiv.org/abs/2103.04023
- Code: https://github.com/Zhangjinso/PISE

**DeFLOCNet: Deep Image Editing via Flexible Low-level Controls**

- Paper: http://raywzy.com/
- Code: http://raywzy.com/

**Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**

- Paper: None
- Code: None

<a name="Image-Captioning"></a>

# 图像描述(Image Captioning)

**Towards Accurate Text-based Image Captioning with Content Diversity Exploration**

- Paper: https://arxiv.org/abs/2105.03236
- Code: None

<a name="Font-Generation"></a>

# 字体生成(Font Generation)

**DG-Font: Deformable Generative Networks for Unsupervised Font Generation**

- Paper: https://arxiv.org/abs/2104.03064

- Code: https://github.com/ecnuycxie/DG-Font

<a name="Image-Matching"></a>

# 图像匹配(Image Matcing)

**LoFTR: Detector-Free Local Feature Matching with Transformers**

- Homepage: https://zju3dv.github.io/loftr/
- Paper: https://arxiv.org/abs/2104.00680
- Code: https://github.com/zju3dv/LoFTR

**Convolutional Hough Matching Networks**

- Homapage: http://cvlab.postech.ac.kr/research/CHM/
- Paper(Oral): https://arxiv.org/abs/2103.16831
- Code: None

<a name="Image-Blending"></a>

# 图像融合(Image Blending)

**Bridging the Visual Gap: Wide-Range Image Blending**

- Paper: https://arxiv.org/abs/2103.15149

- Code: https://github.com/julia0607/Wide-Range-Image-Blending

<a name="Reflection-Removal"></a>

# 反光去除(Reflection Removal)

**Robust Reflection Removal with Reflection-free Flash-only Cues**

- Paper: https://arxiv.org/abs/2103.04273
- Code: https://github.com/ChenyangLEI/flash-reflection-removal

<a name="3D-C"></a>

# 3D点云分类(3D Point Clouds Classification)

**Equivariant Point Network for 3D Point Cloud Analysis**

- Paper: https://arxiv.org/abs/2103.14147
- Code: None

**PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds**

- Paper: https://arxiv.org/abs/2103.14635
- Code: https://github.com/CVMI-Lab/PAConv

<a name="3D-Object-Detection"></a>

# 3D目标检测(3D Object Detection)

**3D-MAN: 3D Multi-frame Attention Network for Object Detection**

- Paper: https://arxiv.org/abs/2103.16054
- Code: None

**Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds**

- Paper: https://arxiv.org/abs/2104.06114
- Code: https://github.com/cheng052/BRNet

**HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection**

- Homepage:  https://cvlab.yonsei.ac.kr/projects/HVPR/ 

- Paper: https://arxiv.org/abs/2104.00902
- Code:  https://github.com/cvlab-yonsei/HVPR 

**LiDAR R-CNN: An Efficient and Universal 3D Object Detector**

- Paper: https://arxiv.org/abs/2103.15297
- Code: https://github.com/tusimple/LiDAR_RCNN

**M3DSSD: Monocular 3D Single Stage Object Detector**

- Paper: https://arxiv.org/abs/2103.13164

- Code: https://github.com/mumianyuxin/M3DSSD

**SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud**

- Paper: None
- Code: https://github.com/Vegeta2020/SE-SSD

**Center-based 3D Object Detection and Tracking**

- Paper: https://arxiv.org/abs/2006.11275
- Code: https://github.com/tianweiy/CenterPoint

**Categorical Depth Distribution Network for Monocular 3D Object Detection**

- Paper: https://arxiv.org/abs/2103.01100
- Code: None

<a name="3D-Semantic-Segmentation"></a>

# 3D语义分割(3D Semantic Segmentation)

**Bidirectional Projection Network for Cross Dimension Scene Understanding**

- Paper(Oral): https://arxiv.org/abs/2103.14326
- Code: https://github.com/wbhu/BPNet

**Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion**

- Paper: https://arxiv.org/abs/2103.07074
- Code: https://github.com/ShiQiu0419/BAAF-Net

**Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation**

- Paper: https://arxiv.org/abs/2011.10033
- Code:  https://github.com/xinge008/Cylinder3D 

 **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges**

- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban

<a name="3D-Panoptic-Segmentation"></a>

# 3D全景分割(3D Panoptic Segmentation)

**Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2103.14962
- Code: https://github.com/edwardzhou130/Panoptic-PolarNet

<a name="3D-Object-Tracking"></a>

# 3D目标跟踪(3D Object Trancking)

**Center-based 3D Object Detection and Tracking**

- Paper: https://arxiv.org/abs/2006.11275
- Code: https://github.com/tianweiy/CenterPoint

<a name="3D-PointCloud-Registration"></a>

# 3D点云配准(3D Point Cloud Registration)

**ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning**

- Paper: https://arxiv.org/abs/2103.15231
- Code: None

**PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency**

- Paper: https://arxiv.org/abs/2103.05465
- Code: https://github.com/XuyangBai/PointDSC 

**PREDATOR: Registration of 3D Point Clouds with Low Overlap**

- Paper: https://arxiv.org/abs/2011.13005
- Code: https://github.com/ShengyuH/OverlapPredator

<a name="3D-Point-Cloud-Completion"></a>

# 3D点云补全(3D Point Cloud Completion)

**Unsupervised 3D Shape Completion through GAN Inversion**

- Homepage: https://junzhezhang.github.io/projects/ShapeInversion/
- Paper: https://arxiv.org/abs/2104.13366 
- Code: https://github.com/junzhezhang/shape-inversion 

**Variational Relational Point Completion Network**

- Homepage:  https://paul007pl.github.io/projects/VRCNet 
- Paper: https://arxiv.org/abs/2104.10154
- Code: https://github.com/paul007pl/VRCNet

**Style-based Point Generator with Adversarial Rendering for Point Cloud Completion**

- Homepage: https://alphapav.github.io/SpareNet/

- Paper: https://arxiv.org/abs/2103.02535
- Code: https://github.com/microsoft/SpareNet

<a name="3D-Reconstruction"></a>

# 3D重建(3D Reconstruction)

**Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection**

- Paper: http://arxiv.org/abs/2106.07852
- Code: https://github.com/TencentYoutuResearch/3DFaceReconstruction-LAP

**Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction**

- Paper: https://arxiv.org/abs/2104.00858
- Code: None

**NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video**

- Homepage: https://zju3dv.github.io/neuralrecon/

- Paper(Oral): https://arxiv.org/abs/2104.00681
- Code: https://github.com/zju3dv/NeuralRecon

<a name="6D-Pose-Estimation"></a>

# 6D位姿估计(6D Pose Estimation)

**FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism**

- Paper(Oral): https://arxiv.org/abs/2103.07054
- Code: https://github.com/DC1991/FS-Net

**GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation**

- Paper: http://arxiv.org/abs/2102.12145
- code: https://git.io/GDR-Net

**FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation**

- Paper: https://arxiv.org/abs/2103.02242
- Code: https://github.com/ethnhe/FFB6D

<a name="Camera-Pose-Estimation"></a>

# 相机姿态估计

**Back to the Feature: Learning Robust Camera Localization from Pixels to Pose**

- Paper: https://arxiv.org/abs/2103.09213
- Code: https://github.com/cvg/pixloc

<a name="Depth-Estimation"></a>

# 深度估计(Depth Estimation)

**S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation**

- Paper(Oral): https://arxiv.org/abs/2104.00877
- Code: None

**Beyond Image to Depth: Improving Depth Prediction using Echoes**

- Homepage: https://krantiparida.github.io/projects/bimgdepth.html
- Paper: https://arxiv.org/abs/2103.08468
- Code: https://github.com/krantiparida/beyond-image-to-depth

**S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation**

- Paper: https://arxiv.org/abs/2103.02396
- Code: None

**Depth from Camera Motion and Object Detection**

- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD

<a name="Stereo-Matching"></a>

# 立体匹配(Stereo Matching)

**A Decomposition Model for Stereo Matching**

- Paper: https://arxiv.org/abs/2104.07516
- Code: None

<a name="Flow-Estimation"></a>

# 光流估计(Flow Estimation)

**Self-Supervised Multi-Frame Monocular Scene Flow**

- Paper: https://arxiv.org/abs/2105.02216
- Code: https://github.com/visinf/multi-mono-sf

**RAFT-3D: Scene Flow using Rigid-Motion Embeddings**

- Paper: https://arxiv.org/abs/2012.00726v1
- Code: None

**Learning Optical Flow From Still Images**

- Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/

- Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf
- Code: https://github.com/mattpoggi/depthstillation

**FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds**

- Paper: https://arxiv.org/abs/2104.00798
- Code: None

<a name="Lane-Detection"></a>

# 车道线检测(Lane Detection)

**Focus on Local: Detecting Lane Marker from Bottom Up via Key Point**

- Paper: https://arxiv.org/abs/2105.13680
- Code: None

**Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection**

- Paper: https://arxiv.org/abs/2010.12035
- Code: https://github.com/lucastabelini/LaneATT 

<a name="Trajectory-Prediction"></a>

# 轨迹预测(Trajectory Prediction)

**Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction**

- Paper(Oral): https://arxiv.org/abs/2104.08277
- Code: None

<a name="Crowd-Counting"></a>

# 人群计数(Crowd Counting)

**Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark**

- Paper: https://arxiv.org/abs/2105.02440

- Code: https://github.com/VisDrone/DroneCrowd

- Dataset: https://github.com/VisDrone/DroneCrowd

<a name="AE"></a>

# 对抗样本(Adversarial Examples)

**Enhancing the Transferability of Adversarial Attacks through Variance Tuning**

- Paper: https://arxiv.org/abs/2103.15571
- Code: https://github.com/JHL-HUST/VT

**LiBRe: A Practical Bayesian Approach to Adversarial Detection**

- Paper: https://arxiv.org/abs/2103.14835
- Code: None

**Natural Adversarial Examples**

- Paper: https://arxiv.org/abs/1907.07174
- Code: https://github.com/hendrycks/natural-adv-examples

<a name="Image-Retrieval"></a>

# 图像检索(Image Retrieval)

**StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval**

- Paper: https://arxiv.org/abs/2103.15706
- COde: None

**QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval**

- Paper: https://arxiv.org/abs/2103.02927
- Code: None

<a name="Video-Retrieval"></a>

# 视频检索(Video Retrieval)

**On Semantic Similarity in Video Retrieval**

- Paper: https://arxiv.org/abs/2103.10095

- Homepage: https://mwray.github.io/SSVR/
- Code: https://github.com/mwray/Semantic-Video-Retrieval

<a name="Cross-modal-Retrieval"></a>

# 跨模态检索(Cross-modal Retrieval)

**Cross-Modal Center Loss for 3D Cross-Modal Retrieval**

- Paper: https://arxiv.org/abs/2008.03561
- Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss 

**Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers**

- Paper: https://arxiv.org/abs/2103.16553
- Code: None

**Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning**

- Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning

- Code: https://github.com/amzn/image-to-recipe-transformers

<a name="Zero-Shot-Learning"></a>

#  Zero-Shot Learning

**Counterfactual Zero-Shot and Open-Set Visual Recognition**

- Paper: https://arxiv.org/abs/2103.00887
- Code: https://github.com/yue-zhongqi/gcm-cf

<a name="Federated-Learning"></a>

# 联邦学习(Federated Learning)

**FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space**

- Paper: https://arxiv.org/abs/2103.06030
- Code: https://github.com/liuquande/FedDG-ELCFS

<a name="Video-Frame-Interpolation"></a>

# 视频插帧(Video Frame Interpolation)

**CDFI: Compression-Driven Network Design for Frame Interpolation**

- Paper: None
- Code: https://github.com/tding1/CDFI

**FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation**

- Homepage: https://tarun005.github.io/FLAVR/

- Paper: https://arxiv.org/abs/2012.08512
- Code: https://github.com/tarun005/FLAVR

<a name="Visual-Reasoning"></a>

# 视觉推理(Visual Reasoning)

**Transformation Driven Visual Reasoning**

- homepage: https://hongxin2019.github.io/TVR/
- Paper: https://arxiv.org/abs/2011.13160
- Code: https://github.com/hughplay/TVR

<a name="Image-Synthesis"></a>

# 图像合成(Image Synthesis)

**GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields**

- Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html
- Paper(Oral): https://arxiv.org/abs/2011.12100

- Code: https://github.com/autonomousvision/giraffe

- Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1

**Taming Transformers for High-Resolution Image Synthesis**

- Homepage: https://compvis.github.io/taming-transformers/
- Paper(Oral): https://arxiv.org/abs/2012.09841
- Code: https://github.com/CompVis/taming-transformers

<a name="Visual-Synthesis"></a>

# 视图合成(View Synthesis)

**Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes**

- Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/
- Paper: https://arxiv.org/abs/2104.06935

**Self-Supervised Visibility Learning for Novel View Synthesis**

- Paper: https://arxiv.org/abs/2103.15407
- Code: None

**NeX: Real-time View Synthesis with Neural Basis Expansion**

- Homepage: https://nex-mpi.github.io/
- Paper(Oral): https://arxiv.org/abs/2103.05606

<a name="Style-Transfer"></a>

# 风格迁移(Style Transfer)

**Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer**

- Paper: https://arxiv.org/abs/2104.05376
- Code: https://github.com/PaddlePaddle/PaddleGAN/

<a name="Layout-Generation"></a>

# 布局生成(Layout Generation)

**LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity**

- Paper: None
- Code: None

**Variational Transformer Networks for Layout Generation**

- Paper: https://arxiv.org/abs/2104.02416
- Code: None

<a name="Domain-Generalization"></a>

# Domain Generalization

**Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections**

- Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/papers/Pandey_Generalization_on_Unseen_Domains_via_Inference-Time_Label-Preserving_Target_Projections_CVPR_2021_paper.pdf
- Code: https://github.com/VSumanth99/InferenceTimeDG

**Generalizable Person Re-identification with Relevance-aware Mixture of Experts**

- Paper: https://arxiv.org/abs/2105.09156
- Code: None

**RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening**

- Paper: https://arxiv.org/abs/2103.15597
- Code: https://github.com/shachoi/RobustNet

**Adaptive Methods for Real-World Domain Generalization**

- Paper: https://arxiv.org/abs/2103.15796
- Code: None

**FSDR: Frequency Space Domain Randomization for Domain Generalization**

- Paper: https://arxiv.org/abs/2103.02370
- Code: None

<a name="Domain-Adaptation"></a>

# Domain Adaptation

**Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation**

- Paper: https://arxiv.org/abs/2104.00808
- Code: None

**Domain Consensus Clustering for Universal Domain Adaptation**

- Paper: http://reler.net/papers/guangrui_cvpr2021.pdf
- Code: https://github.com/Solacex/Domain-Consensus-Clustering 

<a name="Open-Set"></a>

# Open-Set

**Towards Open World Object Detection**

- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD

**Exemplar-Based Open-Set Panoptic Segmentation Network**

- Homepage: https://cv.snu.ac.kr/research/EOPSN/
- Paper: https://arxiv.org/abs/2105.08336
- Code: https://github.com/jd730/EOPSN

**Learning Placeholders for Open-Set Recognition**

- Paper(Oral): https://arxiv.org/abs/2103.15086
- Code: None

<a name="Adversarial-Attack"></a>

# Adversarial Attack

**IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking**

- Paper: https://arxiv.org/abs/2103.14938
- Code: https://github.com/VISION-SJTU/IoUattack

<a name="HOI"></a>

# "人-物"交互(HOI)检测

**HOTR: End-to-End Human-Object Interaction Detection with Transformers**

- Paper: https://arxiv.org/abs/2104.13682
- Code: None

**Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information**

- Paper: https://arxiv.org/abs/2103.05399
- Code: https://github.com/hitachi-rd-cv/qpic

**Reformulating HOI Detection as Adaptive Set Prediction**

- Paper: https://arxiv.org/abs/2103.05983
- Code: https://github.com/yoyomimi/AS-Net

**Detecting Human-Object Interaction via Fabricated Compositional Learning**

- Paper: https://arxiv.org/abs/2103.08214
- Code: https://github.com/zhihou7/FCL

**End-to-End Human Object Interaction Detection with HOI Transformer**

- Paper: https://arxiv.org/abs/2103.04503
- Code: https://github.com/bbepoch/HoiTransformer

<a name="Shadow-Removal"></a>

# 阴影去除(Shadow Removal)

**Auto-Exposure Fusion for Single-Image Shadow Removal**

- Paper: https://arxiv.org/abs/2103.01255
- Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal

<a name="Virtual-Try-On"></a>

# 虚拟换衣(Virtual Try-On)

**Parser-Free Virtual Try-on via Distilling Appearance Flows**

**基于外观流蒸馏的无需人体解析的虚拟换装**

- Paper: https://arxiv.org/abs/2103.04559
- Code: https://github.com/geyuying/PF-AFN 

<a name="Label-Noise"></a>

# 标签噪声(Label Noise)

**A Second-Order Approach to Learning with Instance-Dependent Label Noise**

- Paper(Oral): https://arxiv.org/abs/2012.11854
- Code: https://github.com/UCSC-REAL/CAL

<a name="Video-Stabilization"></a>

# 视频稳像(Video Stabilization)

**Real-Time Selfie Video Stabilization**

- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf

- Code: https://github.com/jiy173/selfievideostabilization

<a name="Datasets"></a>

# 数据集(Datasets)

**Tracking Pedestrian Heads in Dense Crowd**

- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html
- Code1: https://github.com/Sentient07/HeadHunter
- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T
- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/

**Part-aware Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2106.06351
- Code: https://github.com/tue-mps/panoptic_parts
- Dataset: https://github.com/tue-mps/panoptic_parts

**Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos**

- Homepage: https://www.yasamin.page/hdnet_tiktok

- Paper(Oral): https://arxiv.org/abs/2103.03319

- Code: https://github.com/yasaminjafarian/HDNet_TikTok

- Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v

**High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network**

- Paper: https://arxiv.org/abs/2105.09188
- Code: https://github.com/csjliang/LPTN
- Dataset: https://github.com/csjliang/LPTN

**Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark**

- Paper: https://arxiv.org/abs/2105.02440

- Code: https://github.com/VisDrone/DroneCrowd

- Dataset: https://github.com/VisDrone/DroneCrowd

**Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets**

- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
- Paper(Oral): https://arxiv.org/abs/2104.12690
- Code: https://github.com/fidler-lab/efficient-annotation-cookbook

论文下载链接：

**ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation**

- Paper: https://arxiv.org/abs/2012.05258
- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

**Learning To Count Everything**

- Paper: https://arxiv.org/abs/2104.08391
- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything

**Semantic Image Matting**

- Paper: https://arxiv.org/abs/2104.08201
- Code: https://github.com/nowsyn/SIM
- Dataset: https://github.com/nowsyn/SIM

**Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline**

- Homepage: http://mepro.bjtu.edu.cn/resource.html
- Paper: https://arxiv.org/abs/2104.06174
- Code: None

**Visual Semantic Role Labeling for Video Understanding**

- Homepage: https://vidsitu.org/

- Paper: https://arxiv.org/abs/2104.00990
- Code: https://github.com/TheShadow29/VidSitu
- Dataset: https://github.com/TheShadow29/VidSitu

**VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild**

- Homepage: https://www.vspwdataset.com/
- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
- GitHub: https://github.com/sssdddwww2/vspw_dataset_download

**Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark**

- Homepage: https://vap.aau.dk/sewer-ml/
- Paper: https://arxiv.org/abs/2103.10619

**Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark**

- Homepage: https://vap.aau.dk/sewer-ml/

- Paper: https://arxiv.org/abs/2103.10895

**Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food**

- Paper: https://arxiv.org/abs/2103.03375
- Dataset: None

 **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges**

- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban

**When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework**

- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace

**Depth from Camera Motion and Object Detection**

- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD

**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**

- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill

**Scan2Cap: Context-aware Dense Captioning in RGB-D Scans**

- Paper: https://arxiv.org/abs/2012.02206
- Code: https://github.com/daveredrum/Scan2Cap

- Dataset: https://github.com/daveredrum/ScanRefer

**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**

- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

<a name="Others"></a>

# 其他(Others)

**Fast and Accurate Model Scaling**

- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html

- Code: https://github.com/facebookresearch/pycls

**Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos**

- Homepage: https://www.yasamin.page/hdnet_tiktok

- Paper(Oral): https://arxiv.org/abs/2103.03319

- Code: https://github.com/yasaminjafarian/HDNet_TikTok

- Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v

**Omnimatte: Associating Objects and Their Effects in Video**

- Homepage: https://omnimatte.github.io/

- Paper(Oral): https://arxiv.org/abs/2105.06993
- Code: https://omnimatte.github.io/#code

**Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets**

- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
- Paper(Oral): https://arxiv.org/abs/2104.12690
- Code: https://github.com/fidler-lab/efficient-annotation-cookbook

**Motion Representations for Articulated Animation**

- Paper: https://arxiv.org/abs/2104.11280
- Code: https://github.com/snap-research/articulated-animation

**Deep Lucas-Kanade Homography for Multimodal Image Alignment**

- Paper: https://arxiv.org/abs/2104.11693
- Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography

**Skip-Convolutions for Efficient Video Processing**

- Paper: https://arxiv.org/abs/2104.11487
- Code: None

**KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control**

- Homepage: http://tomasjakab.github.io/KeypointDeformer

- Paper(Oral): https://arxiv.org/abs/2104.11224
- Code: https://github.com/tomasjakab/keypoint_deformer/

**Learning To Count Everything**

- Paper: https://arxiv.org/abs/2104.08391
- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything

**SOLD2: Self-supervised Occlusion-aware Line Description and Detection**

- Paper(Oral): https://arxiv.org/abs/2104.03362
- Code: https://github.com/cvg/SOLD2

**Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression**

- Homepage: https://li-wanhua.github.io/POEs/
- Paper:  https://arxiv.org/abs/2103.13629
- Code: https://github.com/Li-Wanhua/POEs

**LEAP: Learning Articulated Occupancy of People**

- Paper: https://arxiv.org/abs/2104.06849
- Code: None

**Visual Semantic Role Labeling for Video Understanding**

- Homepage: https://vidsitu.org/

- Paper: https://arxiv.org/abs/2104.00990
- Code: https://github.com/TheShadow29/VidSitu
- Dataset: https://github.com/TheShadow29/VidSitu

**UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles**

- Paper: https://arxiv.org/abs/2104.00946
- Code: https://github.com/SUTDCV/UAV-Human 

**Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning**

- Paper(Oral): https://arxiv.org/abs/2104.00924
- Code: None

**Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction**

- Paper: https://arxiv.org/abs/2104.00858
- Code: None

**Towards High Fidelity Face Relighting with Realistic Shadows**

- Paper: https://arxiv.org/abs/2104.00825
- Code: None

**BRepNet: A topological message passing system for solid models**

- Paper(Oral): https://arxiv.org/abs/2104.00706
- Code: None

**Visually Informed Binaural Audio Generation without Binaural Audios**

- Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
- Paper: None

- GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
- Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc

**Exploring intermediate representation for monocular vehicle pose estimation**

- Paper: None
- Code: https://github.com/Nicholasli1995/EgoNet

**Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB**

- Paper(Oral): https://arxiv.org/abs/2103.14708
- Code: None

**Invertible Image Signal Processing**

- Paper: https://arxiv.org/abs/2103.15061
- Code: https://github.com/yzxing87/Invertible-ISP

**Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling**

- Paper: https://arxiv.org/abs/2103.14858
- Code: None

**SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences**

- Paper: https://arxiv.org/abs/2103.14898
- Code: None

**Embedding Transfer with Label Relaxation for Improved Metric Learning**

- Paper: https://arxiv.org/abs/2103.14908
- Code: None

**Picasso: A CUDA-based Library for Deep Learning over 3D Meshes**

- Paper: https://arxiv.org/abs/2103.15076 
- Code: https://github.com/hlei-ziyan/Picasso

**Meta-Mining Discriminative Samples for Kinship Verification**

- Paper: https://arxiv.org/abs/2103.15108
- Code: None

**Cloud2Curve: Generation and Vectorization of Parametric Sketches**

- Paper: https://arxiv.org/abs/2103.15536
- Code: None

**TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events**

- Paper: https://arxiv.org/abs/2103.15538
- Code: https://github.com/SUTDCV/SUTD-TrafficQA

**Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution**

- Homepage: http://wellyzhang.github.io/project/prae.html

- Paper: https://arxiv.org/abs/2103.14230
- Code: None

**ACRE: Abstract Causal REasoning Beyond Covariation**

- Homepage: http://wellyzhang.github.io/project/acre.html

- Paper: https://arxiv.org/abs/2103.14232
- Code: None

**Confluent Vessel Trees with Accurate Bifurcations**

- Paper: https://arxiv.org/abs/2103.14268
- Code: None

**Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling**

- Paper: https://arxiv.org/abs/2103.14338
- Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer

**Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks**

- Homepage: https://paschalidoud.github.io/neural_parts
- Paper: None 
- Code: https://github.com/paschalidoud/neural_parts 

**Knowledge Evolution in Neural Networks**

- Paper(Oral): https://arxiv.org/abs/2103.05152
- Code: https://github.com/ahmdtaha/knowledge_evolution

**Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning**

- Paper: https://arxiv.org/abs/2103.02148
- Code: https://github.com/guopengf/FLMRCM

**SGP: Self-supervised Geometric Perception**

- Oral

- Paper: https://arxiv.org/abs/2103.03114
- Code: https://github.com/theNded/SGP

**Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning**

- Paper: https://arxiv.org/abs/2103.02148
- Code: https://github.com/guopengf/FLMRCM

**Diffusion Probabilistic Models for 3D Point Cloud Generation**

- Paper: https://arxiv.org/abs/2103.01458
- Code: https://github.com/luost26/diffusion-point-cloud

**Scan2Cap: Context-aware Dense Captioning in RGB-D Scans**

- Paper: https://arxiv.org/abs/2012.02206
- Code: https://github.com/daveredrum/Scan2Cap

- Dataset: https://github.com/daveredrum/ScanRefer

**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**

- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill

- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

<a name="TO-DO"></a>

# 待添加(TODO)

- [重磅！腾讯优图20篇论文入选CVPR 2021](https://mp.weixin.qq.com/s/McAtOVh0osWZ3uppEoHC8A)
- [MePro团队三篇论文被CVPR 2021接收](https://mp.weixin.qq.com/s/GD5Zb6u_MQ8GZIAGeCGo3Q)

<a name="Not-Sure"></a>

# 不确定中没中(Not Sure)

**CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models**

- Paper: none
- Code: https://github.com/transcendentsky/Film-Recovery

**Toward Explainable Reflection Removal with Distilling and Model Uncertainty**

- Paper: none
- Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty

**DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation**

- Paper: none
- Code: https://github.com/lhaippp/DeepOIS

**Exploring Adversarial Fake Images on Face Manifold**

- Paper: none
- Code: https://github.com/ldz666666/Style-atk

**Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task**

- Paper: none
- Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task

**Temporal Contrastive Graph for Self-supervised Video Representation Learning**

- Paper: none
- Code: https://github.com/YangLiu9208/TCG

**Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching**

- Paper: none
- Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr

**Fast and Memory-Efficient Compact Bilinear Pooling**

- Paper: none
- Code: https://github.com/cvpr2021kp2/cvpr2021kp2

**Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine**

- Paper: none
- Code: https://github.com/gapDetection/cvpr2021

 **Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation** 

- Paper: none
- Code: https://github.com/interactivekeypoint2020/Morph

https://github.com/ShaoQiangShen/CVPR2021

https://github.com/gillesflash/CVPR2021

https://github.com/anonymous-submission1991/BaLeNAS

https://github.com/cvpr2021dcb/cvpr2021dcb

https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578

https://github.com/AldrichZeng/FreqPrune

https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM

https://github.com/ddfss/datadrive-fss


================================================
FILE: CVPR2022-Papers-with-Code.md
================================================
# CVPR 2022 论文和开源项目合集(Papers with Code)

[CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)！

CVPR 2022 收录列表ID：https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view

> 注1：欢迎各位大佬提交issue，分享CVPR 2022论文和开源项目！
>
> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
>
> - [CVPR 2019](CVPR2019-Papers-with-Code.md)
> - [CVPR 2020](CVPR2020-Papers-with-Code.md)
> - [CVPR 2021](CVPR2021-Papers-with-Code.md)

如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ 

![](CVer学术交流群.png)

## 【CVPR 2022 论文开源目录】

- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [GAN](#GAN)
- [GNN](#GNN)
- [MLP](#MLP)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [3D Face](#3D Face)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Visual Transformer](#Visual-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [目标检测(Object Detection)](#Object-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [小样本分类(Few-Shot Classification)](#FFC)
- [小样本分割(Few-Shot Segmentation)](#FFS)
- [图像抠图(Image Matting)](#Matting)
- [视频理解(Video Understanding)](#VU)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#Super-Resolution)
- [去模糊(Deblur)](#Deblur)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3D-Object-Detection)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D重建(3D Reconstruction)](#3D-R)
- [行人重识别(Person Re-identification)](#ReID)
- [伪装物体检测(Camouflaged Object Detection)](#COD)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#FM)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation)
- [图像修复(Image Inpainting)](#Image-Inpainting)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [人脸识别(Face Recognition)](#Face-Recognition)
- [人群计数(Crowd Counting)](#Crowd-Counting)
- [医学图像(Medical Image)](#Medical-Image)
- [视频生成(Video Generation)](#Video Generation)
- [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
- [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS)
- [步态识别(Gait Recognition)](#GR)
- [风格迁移(Style Transfer)](#ST)
- [异常检测(Anomaly Detection](#AD)
- [对抗样本(Adversarial Examples)](#AE)
- [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL)
- [雷达目标检测(Radar Object Detection)](#ROD)
- [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI)
- [图像拼接(Image Stitching)](#Image-Stitching)
- [水印(Watermarking)](#Watermarking)
- [Action Counting](#AC)
- [Grounded Situation Recognition](#GSR)
- [Zero-shot Learning](#ZSL)
- [DeepFakes](#DeepFakes)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)

<a name="Backbone"></a>

# Backbone

**A ConvNet for the 2020s**

- Paper: https://arxiv.org/abs/2201.03545
- Code: https://github.com/facebookresearch/ConvNeXt
- 中文解读：https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw

**Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs**

- Paper: https://arxiv.org/abs/2203.06717

- Code: https://github.com/megvii-research/RepLKNet
- Code2: https://github.com/DingXiaoH/RepLKNet-pytorch

- 中文解读：https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg

**MPViT : Multi-Path Vision Transformer for Dense Prediction**

- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT
- 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg

**Mobile-Former: Bridging MobileNet and Transformer**

- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ

**MetaFormer is Actually What You Need for Vision**

- Paper: https://arxiv.org/abs/2111.11418
- Code: https://github.com/sail-sg/poolformer

**Shunted Self-Attention via Multi-Scale Token Aggregation**

-  Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer

**TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing**

- Paper: http://arxiv.org/abs/2203.10489
- Code: https://github.com/JierunChen/TVConv

**Learned Queries for Efficient Local Attention**

- Paper(Oral): https://arxiv.org/abs/2112.11435
- Code: https://github.com/moabarar/qna

**RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**

- Paper: https://arxiv.org/abs/2112.11081
- Code: https://github.com/DingXiaoH/RepMLP

<a name="CLIP"></a>

# CLIP

**HairCLIP: Design Your Hair by Text and Reference Image**

- Paper: https://arxiv.org/abs/2112.05142

- Code: https://github.com/wty-ustc/HairCLIP

**PointCLIP: Point Cloud Understanding by CLIP**

- Paper: https://arxiv.org/abs/2112.02413
- Code: https://github.com/ZrrSkywalker/PointCLIP

**Blended Diffusion for Text-driven Editing of Natural Images**

- Paper: https://arxiv.org/abs/2111.14818

- Code: https://github.com/omriav/blended-diffusion

<a name="GAN"></a>

# GAN

**SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**

- Homepage: https://semanticstylegan.github.io/

- Paper: https://arxiv.org/abs/2112.02236
- Demo: https://semanticstylegan.github.io/videos/demo.mp4

**Style Transformer for Image Inversion and Editing**

- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer

**Unsupervised Image-to-Image Translation with Generative Prior**

- Homepage: https://www.mmlab-ntu.com/project/gpunit/
- Paper: https://arxiv.org/abs/2204.03641
- Code: https://github.com/williamyang1991/GP-UNIT

**StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**

- Homepage: https://universome.github.io/stylegan-v
- Paper: https://arxiv.org/abs/2112.14683
- Code: https://github.com/universome/stylegan-v

**OSSGAN: Open-set Semi-supervised Image Generation**

- Paper: https://arxiv.org/abs/2204.14249
- Code: https://github.com/raven38/OSSGAN

**Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**

- Paper: https://arxiv.org/abs/2204.06160
- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution

<a name="GNN"></a>

# GNN

**OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks**

- Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf 
- Code: https://github.com/WanyuGroup/CVPR2022-OrphicX

<a name="MLP"></a>

# MLP

**RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**

- Paper: https://arxiv.org/abs/2112.11081
- Code: https://github.com/DingXiaoH/RepMLP

<a name="NAS"></a>

# NAS

**β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search**

- Paper: https://arxiv.org/abs/2203.01665
- Code: https://github.com/Sunshine-Ye/Beta-DARTS

**ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**

- Paper: https://arxiv.org/abs/2111.15362
- Code: None

<a name="OCR"></a>

# OCR

**SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**

- Paper: https://arxiv.org/abs/2203.10209

- Code: https://github.com/mxin262/SwinTextSpotter

<a name="NeRF"></a>

# NeRF

**Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields**

- Homepage: https://jonbarron.info/mipnerf360/
- Paper: https://arxiv.org/abs/2111.12077

- Demo: https://youtu.be/YStDS2-Ln1s

**Point-NeRF: Point-based Neural Radiance Fields**

- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
- Paper: https://arxiv.org/abs/2201.08845
- Code: https://github.com/Xharlie/point-nerf

**NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images**

- Paper: https://arxiv.org/abs/2111.13679
- Homepage: https://bmild.github.io/rawnerf/
- Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc

**Urban Radiance Fields**

- Homepage: https://urban-radiance-fields.github.io/

- Paper: https://arxiv.org/abs/2111.14643
- Demo: https://youtu.be/qGlq5DZT6uc

**Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation**

- Paper: https://arxiv.org/abs/2202.13162
- Code: https://github.com/HexagonPrime/Pix2NeRF

**HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video**

- Homepage: https://grail.cs.washington.edu/projects/humannerf/
- Paper: https://arxiv.org/abs/2201.04127

- Demo: https://youtu.be/GM-RoZEymmw

<a name="3D Face"></a>

# 3D Face

**ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations**

- Paper: https://arxiv.org/abs/2203.14510
- Code: https://github.com/MingwuZheng/ImFace 

<a name="Long-Tail"></a>

# 长尾分布(Long-Tail)

**Retrieval Augmented Classification for Long-Tail Visual Recognition**

- Paper: https://arxiv.org/abs/2202.11233
- Code: None

<a name="Visual-Transformer"></a>

# Visual Transformer

## Backbone

**MPViT : Multi-Path Vision Transformer for Dense Prediction**

- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT

**MetaFormer is Actually What You Need for Vision**

- Paper: https://arxiv.org/abs/2111.11418
- Code: https://github.com/sail-sg/poolformer

**Mobile-Former: Bridging MobileNet and Transformer**

- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ

**Shunted Self-Attention via Multi-Scale Token Aggregation**

-  Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer

**Learned Queries for Efficient Local Attention**

- Paper(Oral): https://arxiv.org/abs/2112.11435
- Code: https://github.com/moabarar/qna

## 应用(Application)

**Language-based Video Editing via Multi-Modal Multi-Level Transformer**

- Paper: https://arxiv.org/abs/2104.01122
- Code: None

**MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**

- Paper: https://arxiv.org/abs/2203.00859
- Code: None

**Embracing Single Stride 3D Object Detector with Sparse Transformer**

- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
- 中文解读：https://zhuanlan.zhihu.com/p/476056546

**Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.02891
- Code: https://github.com/xulianuwa/MCTformer

**Spatio-temporal Relation Modeling for Few-shot Action Recognition**

- Paper: https://arxiv.org/abs/2112.05132
- Code: https://github.com/Anirudh257/strm

**Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**

- Paper: https://arxiv.org/abs/2111.07910
- Code: https://github.com/caiyuanhao1998/MST

**Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**

- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT

**GroupViT: Semantic Segmentation Emerges from Text Supervision**

- Homepage: https://jerryxu.net/GroupViT/

- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y

**Restormer: Efficient Transformer for High-Resolution Image Restoration**

- Paper: https://arxiv.org/abs/2111.09881
- Code: https://github.com/swz30/Restormer

**Splicing ViT Features for Semantic Appearance Transfer**

- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice

**Self-supervised Video Transformer**

- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514

- Code: https://github.com/kahnchana/svt

**Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**

- Paper: https://arxiv.org/abs/2203.02664
- Code: https://github.com/rulixiang/afa

**Accelerating DETR Convergence via Semantic-Aligned Matching**

- Paper: https://arxiv.org/abs/2203.06883
- Code: https://github.com/ZhangGongjie/SAM-DETR

**DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**

- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w

**Style Transformer for Image Inversion and Editing**

- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer

**MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**

- Paper: https://arxiv.org/abs/2203.10981

- Code: https://github.com/kuanchihhuang/MonoDTR

**Mask Transfiner for High-Quality Instance Segmentation**

- Paper: https://arxiv.org/abs/2111.13673
- Code: https://github.com/SysCV/transfiner

**Language as Queries for Referring Video Object Segmentation**

- Paper: https://arxiv.org/abs/2201.00487
- Code:  https://github.com/wjn922/ReferFormer
- 中文解读：https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ

**X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**

- Paper: https://arxiv.org/abs/2203.00843
- Code: https://github.com/CurryYuan/X-Trans2Cap

**AdaMixer: A Fast-Converging Query-Based Object Detector**

- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer

**Omni-DETR: Omni-Supervised Object Detection with Transformers**

- Paper: https://arxiv.org/abs/2203.16089
- Code: https://github.com/amazon-research/omni-detr

**SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**

- Paper: https://arxiv.org/abs/2203.10209

- Code: https://github.com/mxin262/SwinTextSpotter

**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**

- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC

**Collaborative Transformers for Grounded Situation Recognition**

- Paper: https://arxiv.org/abs/2203.16518
- Code: https://github.com/jhcho99/CoFormer

**NFormer: Robust Person Re-identification with Neighbor Transformer**

- Paper: https://arxiv.org/abs/2204.09331
- Code: https://github.com/haochenheheda/NFormer

**Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**

- Paper: https://arxiv.org/abs/2201.06889
- Code: None

**Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**

- Paper(Oral): https://arxiv.org/abs/2204.08680
- Code: https://github.com/zengwang430521/TCFormer

**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**

- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX

**Safe Self-Refinement for Transformer-based Domain Adaptation**

- Paper: https://arxiv.org/abs/2204.07683
- Code: https://github.com/tsun/SSRT

**Fast Point Transformer**

- Homepage: http://cvlab.postech.ac.kr/research/FPT/
- Paper: https://arxiv.org/abs/2112.04702
- Code: https://github.com/POSTECH-CVLab/FastPointTransformer

**Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval**

- Paper: https://arxiv.org/abs/2204.09730
- Code: https://github.com/mshukor/TFood

**DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**

- Paper: https://arxiv.org/abs/2111.14887
- Code: https://github.com/lhoyer/DAFormer

**Stratified Transformer for 3D Point Cloud Segmentation**

- Paper: https://arxiv.org/pdf/2203.14508.pdf
- Code: https://github.com/dvlab-research/Stratified-Transformer 

<a name="VL"></a>

# 视觉和语言(Vision-Language)

**Conditional Prompt Learning for Vision-Language Models**

- Paper: https://arxiv.org/abs/2203.05557
- Code: https://github.com/KaiyangZhou/CoOp

**Bridging Video-text Retrieval with Multiple Choice Question**

- Paper: https://arxiv.org/abs/2201.04850
- Code: https://github.com/TencentARC/MCQ

**Visual Abductive Reasoning**

- Paper: https://arxiv.org/abs/2203.14040
- Code: https://github.com/leonnnop/VAR

<a name="SSL"></a>

# 自监督学习(Self-supervised Learning)

**UniVIP: A Unified Framework for Self-Supervised Visual Pre-training**

- Paper: https://arxiv.org/abs/2203.06965
- Code: None

**Crafting Better Contrastive Views for Siamese Representation Learning**

- Paper: https://arxiv.org/abs/2202.03278
- Code: https://github.com/xyupeng/ContrastiveCrop
- 中文解读：https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A

**HCSC: Hierarchical Contrastive Selective Coding**

- Homepage: https://github.com/gyfastas/HCSC
- Paper: https://arxiv.org/abs/2202.00455
- 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ

**DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**

- Paper: https://arxiv.org/abs/2204.10437

- Code: https://github.com/JLiangLab/DiRA

<a name="DA"></a>

# 数据增强(Data Augmentation)

**TeachAugment: Data Augmentation Optimization Using Teacher Knowledge**

- Paper: https://arxiv.org/abs/2202.12513
- Code: https://github.com/DensoITLab/TeachAugment

**AlignMixup: Improving Representations By Interpolating Aligned Features**

- Paper: https://arxiv.org/abs/2103.15375
- Code: https://github.com/shashankvkt/AlignMixup_CVPR22 

<a name="KD"></a>

# 知识蒸馏(Knowledge Distillation)

**Decoupled Knowledge Distillation**

- Paper: https://arxiv.org/abs/2203.08679
- Code: https://github.com/megvii-research/mdistiller
- 中文解读：https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw

<a name="Object-Detection"></a>

# 目标检测(Object Detection)

**BoxeR: Box-Attention for 2D and 3D Transformers**
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

**DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**

- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w

**Accelerating DETR Convergence via Semantic-Aligned Matching**

- Paper: https://arxiv.org/abs/2203.06883
- Code: https://github.com/ZhangGongjie/SAM-DETR

**Localization Distillation for Dense Object Detection**

- Paper: https://arxiv.org/abs/2102.12252
- Code: https://github.com/HikariTJU/LD
- Code2: https://github.com/HikariTJU/LD
- 中文解读：https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg

**Focal and Global Knowledge Distillation for Detectors**

- Paper: https://arxiv.org/abs/2111.11837
- Code: https://github.com/yzd-v/FGD
- 中文解读：https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ

**A Dual Weighting Label Assignment Scheme for Object Detection**

- Paper: https://arxiv.org/abs/2203.09730
- Code: https://github.com/strongwolf/DW

**AdaMixer: A Fast-Converging Query-Based Object Detector**

- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer

**Omni-DETR: Omni-Supervised Object Detection with Transformers**

- Paper: https://arxiv.org/abs/2203.16089
- Code: https://github.com/amazon-research/omni-detr

**SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection**

- Paper(Oral): https://arxiv.org/abs/2203.06398
- Code: https://github.com/CityU-AIM-Group/SIGMA

## 半监督目标检测

**Dense Learning based Semi-Supervised Object Detection**

- Paper: https://arxiv.org/abs/2204.07300

- Code: https://github.com/chenbinghui1/DSL

# 目标跟踪(Visual Tracking)

**Correlation-Aware Deep Tracking**

- Paper: https://arxiv.org/abs/2203.01666
- Code: None

**TCTrack: Temporal Contexts for Aerial Tracking**

- Paper: https://arxiv.org/abs/2203.01885
- Code: https://github.com/vision4robotics/TCTrack

## 多模态目标跟踪

**Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**

- Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/

- Paper: https://arxiv.org/abs/2204.04120

## 多目标跟踪(Multi-Object Tracking)

**Learning of Global Objective for Network Flow in Multi-Object Tracking**

- Paper: https://arxiv.org/abs/2203.16210
- Code: None

**DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**

- Homepage: https://dancetrack.github.io
- Paper: https://arxiv.org/abs/2111.14690
- Dataset: https://github.com/DanceTrack/DanceTrack

<a name="Semantic-Segmentation"></a>

# 语义分割(Semantic Segmentation)

**Novel Class Discovery in Semantic Segmentation**

- Homepage: https://ncdss.github.io/
- Paper: https://arxiv.org/abs/2112.01900
- Code: https://github.com/HeliosZhao/NCDSS

**Deep Hierarchical Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.14335
- Code: https://github.com/0liliulei/HieraSeg 

**Rethinking Semantic Segmentation: A Prototype View**

- Paper(Oral): https://arxiv.org/abs/2203.15102
- Code: https://github.com/tfzhou/ProtoSeg

## 弱监督语义分割

**Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.00962
- Code: https://github.com/zhaozhengChen/ReCAM

**Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.02891
- Code: https://github.com/xulianuwa/MCTformer

**Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**

- Paper: https://arxiv.org/abs/2203.02664
- Code: https://github.com/rulixiang/afa

**CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.02668
- Code: https://github.com/CVI-SZU/CLIMS 

**CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.13505
- Code: https://github.com/CVI-SZU/CCAM 

**FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation**

- Homeapage: http://cvlab.postech.ac.kr/research/FIFO/
- Paper(Oral): https://arxiv.org/abs/2204.01587
- Code: https://github.com/sohyun-l/FIFO 

**Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.09653
- Code: https://github.com/maeve07/RCA.git

## 半监督语义分割

**ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation**

- Paper: https://arxiv.org/abs/2106.05095
- Code: https://github.com/LiheYoung/ST-PlusPlus
- 中文解读：https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA

**Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels**

- Homepage: https://haochen-wang409.github.io/U2PL/
- Paper: https://arxiv.org/abs/2203.03884
- Code: https://github.com/Haochen-Wang409/U2PL
- 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ

**Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation**

- Paper: https://arxiv.org/pdf/2111.12903.pdf
- Code: https://github.com/yyliu01/PS-MT

## 域自适应语义分割

**Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation**

- Paper: https://arxiv.org/abs/2111.12940
- Code: https://github.com/BIT-DA/RIPU

**DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**

- Paper: https://arxiv.org/abs/2111.14887
- Code: https://github.com/lhoyer/DAFormer

## 无监督语义分割

**GroupViT: Semantic Segmentation Emerges from Text Supervision**

- Homepage: https://jerryxu.net/GroupViT/
- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y

## 少样本语义分割

**Generalized Few-shot Semantic Segmentation**

- Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf
- Code: https://github.com/dvlab-research/GFS-Seg 

<a name="Instance-Segmentation"></a>

# 实例分割(Instance Segmentation)

**BoxeR: Box-Attention for 2D and 3D Transformers**
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

**E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation**

- Paper: https://arxiv.org/abs/2203.04074
- Code: https://github.com/zhang-tao-whu/e2ec

**Mask Transfiner for High-Quality Instance Segmentation**

- Paper: https://arxiv.org/abs/2111.13673
- Code: https://github.com/SysCV/transfiner

**Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity**

- Homepage: https://sites.google.com/view/generic-grouping/

- Paper: https://arxiv.org/abs/2204.06107
- Code: https://github.com/facebookresearch/Generic-Grouping

## 自监督实例分割

**FreeSOLO: Learning to Segment Objects without Annotations**

- Paper: https://arxiv.org/abs/2202.12181
- Code: https://github.com/NVlabs/FreeSOLO

## 视频实例分割

**Efficient Video Instance Segmentation via Tracklet Query and Proposal**

- Homepage: https://jialianwu.com/projects/EfficientVIS.html
- Paper: https://arxiv.org/abs/2203.01853
- Demo: https://youtu.be/sSPMzgtMKCE

**Temporally Efficient Vision Transformer for Video Instance Segmentation**

- Paper: https://arxiv.org/abs/2204.08412
- Code: https://github.com/hustvl/TeViT

<a name="Panoptic-Segmentation"></a>

# 全景分割(Panoptic Segmentation)

**Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers**

- Paper: https://arxiv.org/abs/2109.03814
- Code: https://github.com/zhiqi-li/Panoptic-SegFormer

**Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**

- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset 

<a name="FFC"></a>

# 小样本分类(Few-Shot Classification)

**Integrative Few-Shot Learning for Classification and Segmentation**

- Paper: https://arxiv.org/abs/2203.15712
- Code: https://github.com/dahyun-kang/ifsl

**Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification**

- Paper: https://arxiv.org/abs/2106.05517
- Code: https://github.com/LouieYang/MCL

<a name="FFS"></a>

# 小样本分割(Few-Shot Segmentation)

**Learning What Not to Segment: A New Perspective on Few-Shot Segmentation**

- Paper: https://arxiv.org/abs/2203.07615
- Code: https://github.com/chunbolang/BAM

**Integrative Few-Shot Learning for Classification and Segmentation**

- Paper: https://arxiv.org/abs/2203.15712
- Code: https://github.com/dahyun-kang/ifsl

**Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation**

- Paper: https://arxiv.org/abs/2204.10638
- Code: None

<a name="Matting"></a>

# 图像抠图(Image Matting)

**Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**

- Paper: https://arxiv.org/abs/2201.06889
- Code: None

<a name="VU"></a>

# 视频理解(Video Understanding)

**Self-supervised Video Transformer**

- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt

**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**

- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC

**FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**

- Paper(Oral): https://arxiv.org/abs/2204.03646

- Dataset: https://github.com/xujinglin/FineDiving
- Code: https://github.com/xujinglin/FineDiving
- 中文解读：https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg

**Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition**

- Paper(Oral): https://arxiv.org/abs/2204.02148
- Code: None

## 行为识别(Action Recognition)

**Spatio-temporal Relation Modeling for Few-shot Action Recognition**

- Paper: https://arxiv.org/abs/2112.05132
- Code: https://github.com/Anirudh257/strm

## 动作检测(Action Detection)

**End-to-End Semi-Supervised Learning for Video Action Detection**

- Paper: https://arxiv.org/abs/2203.04251
- Code: None

<a name="Image-Editing"></a>

# 图像编辑(Image Editing)

**Style Transformer for Image Inversion and Editing**

- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer

**Blended Diffusion for Text-driven Editing of Natural Images**

- Paper: https://arxiv.org/abs/2111.14818
- Code: https://github.com/omriav/blended-diffusion

**SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**

- Homepage: https://semanticstylegan.github.io/

- Paper: https://arxiv.org/abs/2112.02236
- Demo: https://semanticstylegan.github.io/videos/demo.mp4

<a name="LLV"></a>

# Low-level Vision

**ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**

- Paper: https://arxiv.org/abs/2111.15362
- Code: None

**Restormer: Efficient Transformer for High-Resolution Image Restoration**

- Paper: https://arxiv.org/abs/2111.09881
- Code: https://github.com/swz30/Restormer

**Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements**

- Paper(Oral): https://arxiv.org/abs/2111.12855
- Code: https://github.com/edongdongchen/REI

<a name="Super-Resolution"></a>

# 超分辨率(Super-Resolution)

## 图像超分辨率(Image Super-Resolution)

**Learning the Degradation Distribution for Blind Image Super-Resolution**

- Paper: https://arxiv.org/abs/2203.04962
- Code: https://github.com/greatlog/UnpairedSR

## 视频超分辨率(Video Super-Resolution)

**BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment**

- Paper: https://arxiv.org/abs/2104.13371
- Code: https://github.com/open-mmlab/mmediting
- Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
- 中文解读：https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g

**Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling**

- Paper: https://arxiv.org/abs/2204.07114
- Code: None

**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**

- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX

<a name="Deblur"></a>

# 去模糊(Deblur)

## 图像去模糊(Image Deblur)

**Learning to Deblur using Light Field Generated and Real Defocus Images**

- Homepage: http://lyruan.com/Projects/DRBNet/
- Paper(Oral): https://arxiv.org/abs/2204.00442

- Code: https://github.com/lingyanruan/DRBNet

<a name="3D-Point-Cloud"></a>

# 3D点云(3D Point Cloud)

**Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**

- Homepage: https://point-bert.ivg-research.xyz/

- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT

**A Unified Query-based Paradigm for Point Cloud Understanding**

- Paper: https://arxiv.org/abs/2203.01252
- Code: None 

**CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding**

- Paper: https://arxiv.org/abs/2203.00680
- Code: https://github.com/MohamedAfham/CrossPoint

**PointCLIP: Point Cloud Understanding by CLIP**

- Paper: https://arxiv.org/abs/2112.02413
- Code: https://github.com/ZrrSkywalker/PointCLIP

**Fast Point Transformer**

- Homepage: http://cvlab.postech.ac.kr/research/FPT/
- Paper: https://arxiv.org/abs/2112.04702
- Code: https://github.com/POSTECH-CVLab/FastPointTransformer

**RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds**

- Paper: https://arxiv.org/abs/2205.11028
- Code: https://github.com/gxd1994/RCP

**The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution**

- Paper: https://arxiv.org/abs/2205.15210
- Code: https://github.com/GostInShell/PaRI-Conv 

<a name="3D-Object-Detection"></a>

# 3D目标检测(3D Object Detection)

**Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds**

- Paper(Oral): https://arxiv.org/abs/2203.11139

- Code: https://github.com/yifanzhang713/IA-SSD

- Demo: https://www.youtube.com/watch?v=3jP2o9KXunA

**BoxeR: Box-Attention for 2D and 3D Transformers**
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

**Embracing Single Stride 3D Object Detector with Sparse Transformer**

- Paper: https://arxiv.org/abs/2112.06375

- Code: https://github.com/TuSimple/SST

**Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes** 

- Paper: https://arxiv.org/abs/2011.12001
- Code: https://github.com/qq456cvb/CanonicalVoting

**MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**

- Paper: https://arxiv.org/abs/2203.10981
- Code: https://github.com/kuanchihhuang/MonoDTR

**HyperDet3D: Learning a Scene-conditioned 3D Object Detector**

- Paper: https://arxiv.org/abs/2204.05599
- Code: None

**OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data**

- Paper: https://arxiv.org/abs/2204.06577
- Code: https://github.com/dschinagl/occam

**DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**

- Homepage: https://thudair.baai.ac.cn/index
- Paper: https://arxiv.org/abs/2204.05575
- Code: https://github.com/AIR-THU/DAIR-V2X

**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**

- Homepage: https://ithaca365.mae.cornell.edu/

- Paper: https://arxiv.org/abs/2208.01166

<a name="3DSS"></a>

# 3D语义分割(3D Semantic Segmentation)

**Scribble-Supervised LiDAR Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.08537
- Dataset: https://github.com/ouenal/scribblekitti

**Stratified Transformer for 3D Point Cloud Segmentation**

- Paper: https://arxiv.org/pdf/2203.14508.pdf
- Code: https://github.com/dvlab-research/Stratified-Transformer

# 3D实例分割(3D Instance Segmentation)

**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**

- Homepage: https://ithaca365.mae.cornell.edu/

- Paper: https://arxiv.org/abs/2208.01166

<a name="3D-Object-Tracking"></a>

# 3D目标跟踪(3D Object Tracking)

**Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds**

- Paper: https://arxiv.org/abs/2203.01730
- Code: https://github.com/Ghostish/Open3DSOT

**PTTR: Relational 3D Point Cloud Object Tracking with Transformer**

- Paper: https://arxiv.org/abs/2112.02857
- Code: https://github.com/Jasonkks/PTTR 

<a name="3D-Human-Pose-Estimation"></a>

# 3D人体姿态估计(3D Human Pose Estimation)

**MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation**

- Paper: https://arxiv.org/abs/2111.12707

- Code: https://github.com/Vegetebird/MHFormer

- 中文解读: https://zhuanlan.zhihu.com/p/439459426

**MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**

- Paper: https://arxiv.org/abs/2203.00859
- Code: None

**Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation**

- Paper: https://arxiv.org/abs/2203.07697
- Code: None
- 中文解读：https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw

**BEV: Putting People in their Place: Monocular Regression of 3D People in Depth**

- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code: https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human
- Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI

<a name="3DSSC"></a>

# 3D语义场景补全(3D Semantic Scene Completion)

**MonoScene: Monocular 3D Semantic Scene Completion**

- Paper: https://arxiv.org/abs/2112.00726
- Code: https://github.com/cv-rits/MonoScene

<a name="3D-R"></a>

# 3D重建(3D Reconstruction)

**BANMo: Building Animatable 3D Neural Models from Many Casual Videos**

- Homepage: https://banmo-www.github.io/
- Paper: https://arxiv.org/abs/2112.12761
- Code: https://github.com/facebookresearch/banmo
- 中文解读：https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew

<a name="ReID"></a>

# 行人重识别(Person Re-identification)

**NFormer: Robust Person Re-identification with Neighbor Transformer**

- Paper: https://arxiv.org/abs/2204.09331
- Code: https://github.com/haochenheheda/NFormer

<a name="COD"></a>

# 伪装物体检测(Camouflaged Object Detection)

**Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection**

- Paper: https://arxiv.org/abs/2203.02688
- Code: https://github.com/lartpang/ZoomNet

<a name="Depth-Estimation"></a>

# 深度估计(Depth Estimation)

## 单目深度估计

**NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation**

- Paper: https://arxiv.org/abs/2203.01502
- Code: None

**OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion**

- Paper: https://arxiv.org/abs/2203.00838
- Code: None

**Toward Practical Self-Supervised Monocular Indoor Depth Estimation**

- Paper: https://arxiv.org/abs/2112.02306
- Code: None

**P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior**

- Paper: https://arxiv.org/abs/2204.02091
- Code: https://github.com/SysCV/P3Depth

**Multi-Frame Self-Supervised Depth with Transformers**

- Homepage: https://sites.google.com/tri.global/depthformer

- Paper: https://arxiv.org/abs/2204.07616
- Code: None

<a name="Stereo-Matching"></a>

# 立体匹配(Stereo Matching)

**ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching**

- Paper: https://arxiv.org/abs/2203.02146
- Code: https://github.com/gangweiX/ACVNet

<a name="FM"></a>

# 特征匹配(Feature Matching)

**ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching**

- Paper: https://arxiv.org/abs/2204.11700
- Code: None

<a name="Lane-Detection"></a>

# 车道线检测(Lane Detection)

**Rethinking Efficient Lane Detection via Curve Modeling**

- Paper: https://arxiv.org/abs/2203.02431
- Code: https://github.com/voldemortX/pytorch-auto-drive
- Demo：https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4

**A Keypoint-based Global Association Network for Lane Detection**

- Paper: https://arxiv.org/abs/2204.07335
- Code: https://github.com/Wolfwjs/GANet

<a name="Optical-Flow-Estimation"></a>

# 光流估计(Optical Flow Estimation)

**Imposing Consistency for Optical Flow Estimation**

- Paper: https://arxiv.org/abs/2204.07262
- Code: None

**Deep Equilibrium Optical Flow Estimation**

- Paper: https://arxiv.org/abs/2204.08442
- Code: https://github.com/locuslab/deq-flow

**GMFlow: Learning Optical Flow via Global Matching**

- Paper(Oral): https://arxiv.org/abs/2111.13680
- Code: https://github.com/haofeixu/gmflow

<a name="Image-Inpainting"></a>

# 图像修复(Image Inpainting)

**Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding**

- Paper: https://arxiv.org/abs/2203.00867

- Code: https://github.com/DQiaole/ZITS_inpainting

<a name="Image-Retrieval"></a>

# 图像检索(Image Retrieval)

**Correlation Verification for Image Retrieval**

- Paper(Oral): https://arxiv.org/abs/2204.01458
- Code: https://github.com/sungonce/CVNet

<a name="Face-Recognition"></a>

# 人脸识别(Face Recognition)

**AdaFace: Quality Adaptive Margin for Face Recognition**

- Paper(Oral): https://arxiv.org/abs/2204.00964 
- Code: https://github.com/mk-minchul/AdaFace

<a name="Crowd-Counting"></a>

# 人群计数(Crowd Counting)

**Leveraging Self-Supervision for Cross-Domain Crowd Counting**

- Paper: https://arxiv.org/abs/2103.16291
- Code: None

<a name="Medical-Image"></a>

# 医学图像(Medical Image)

**BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation**

- Paper: https://arxiv.org/abs/2203.02533
- Code: None

**Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification**

- Paper: https://arxiv.org/abs/2111.12918
- Code: https://github.com/FBLADL/ACPL

**DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**

- Paper: https://arxiv.org/abs/2204.10437

- Code: https://github.com/JLiangLab/DiRA

<a name="Video Generation"></a>

# 视频生成(Video Generation)

**StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**

- Homepage: https://universome.github.io/stylegan-v
- Paper: https://arxiv.org/abs/2112.14683

- Code: https://github.com/universome/stylegan-v

- Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4

<a name="Scene-Graph-Generation"></a>

# 场景图生成(Scene Graph Generation)

 **SGTR: End-to-end Scene Graph Generation with Transformer**

- Paper: https://arxiv.org/abs/2112.12970
- Code: None

<a name="R-VOS"></a>

# 参考视频目标分割(Referring Video Object Segmentation)

**Language as Queries for Referring Video Object Segmentation**

- Paper: https://arxiv.org/abs/2201.00487
- Code:  https://github.com/wjn922/ReferFormer

**ReSTR: Convolution-free Referring Image Segmentation Using Transformers**

- Paper: https://arxiv.org/abs/2203.16768
- Code: None

<a name="GR"></a>

# 步态识别(Gait Recognition)

**Gait Recognition in the Wild with Dense 3D Representations and A Benchmark**

- Homepage: https://gait3d.github.io/
- Paper: https://arxiv.org/abs/2204.02569
- Code: https://github.com/Gait3D/Gait3D-Benchmark

<a name="ST"></a>

# 风格迁移(Style Transfer)

**StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions**

- Homepage: https://lukashoel.github.io/stylemesh/
- Paper: https://arxiv.org/abs/2112.01530

- Code: https://github.com/lukasHoel/stylemesh
- Demo：https://www.youtube.com/watch?v=ZqgiTLcNcks

<a name="AD"></a>

# 异常检测(Anomaly Detection)

**UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**

- Paper: https://arxiv.org/abs/2111.08644

- Dataset: https://github.com/lilygeorgescu/UBnormal

**Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection**

- Paper(Oral): https://arxiv.org/abs/2111.09099
- Code: https://github.com/ristea/sspcab

对抗样本)<a name="AE"></a>

# 对抗样本(Adversarial Examples)

**Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon**

- Paper: https://arxiv.org/abs/2203.03818
- Code: https://github.com/hncszyq/ShadowAttack

**LAS-AT: Adversarial Training with Learnable Attack Strategy**

- Paper(Oral): https://arxiv.org/abs/2203.06616
- Code: https://github.com/jiaxiaojunQAQ/LAS-AT

**Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection**

- Paper: https://arxiv.org/abs/2112.04532
- Code: https://github.com/joellliu/SegmentAndComplete

<a name="WSOL"></a>

# 弱监督物体检测(Weakly Supervised Object Localization)

**Weakly Supervised Object Localization as Domain Adaption**

- Paper: https://arxiv.org/abs/2203.01714
- Code: https://github.com/zh460045050/DA-WSOL_CVPR2022

<a name="ROD"></a>

# 雷达目标检测(Radar Object Detection)

**Exploiting Temporal Relations on Radar Perception for Autonomous Driving**

- Paper: https://arxiv.org/abs/2204.01184
- Code: None

<a name="HSI"></a>

# 高光谱图像重建(Hyperspectral Image Reconstruction)

**Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**

- Paper: https://arxiv.org/abs/2111.07910
- Code: https://github.com/caiyuanhao1998/MST

<a name="Image-Stitching"></a>

# 图像拼接(Image Stitching)

**Deep Rectangling for Image Stitching: A Learning Baseline**

- Paper(Oral): https://arxiv.org/abs/2203.03831

- Code: https://github.com/nie-lang/DeepRectangling
- Dataset: https://github.com/nie-lang/DeepRectangling
- 中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q

<a name="Watermarking"></a>

# 水印(Watermarking)

**Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings**

- Paper: https://arxiv.org/abs/2104.13450
- Code: None

<a name="AC"></a>

# Action Counting

**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**

- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC

<a name="GSR"></a>

# Grounded Situation Recognition

**Collaborative Transformers for Grounded Situation Recognition**

- Paper: https://arxiv.org/abs/2203.16518
- Code: https://github.com/jhcho99/CoFormer

<a name="ZSL"></a>

# Zero-shot Learning

**Unseen Classes at a Later Time? No Problem**

- Paper: https://arxiv.org/abs/2203.16517
- Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time

<a name="DeepFakes"></a>

# DeepFakes

**Detecting Deepfakes with Self-Blended Images**

- Paper(Oral): https://arxiv.org/abs/2204.08376

- Code: https://github.com/mapooon/SelfBlendedImages

<a name="Datasets"></a>

# 数据集(Datasets)

**It's About Time: Analog Clock Reading in the Wild**

- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc

**Toward Practical Self-Supervised Monocular Indoor Depth Estimation**

- Paper: https://arxiv.org/abs/2112.02306
- Code: None

**Kubric: A scalable dataset generator**

- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg

**Scribble-Supervised LiDAR Semantic Segmentation**

- Paper: https://arxiv.org/abs/2203.08537
- Dataset: https://github.com/ouenal/scribblekitti

**Deep Rectangling for Image Stitching: A Learning Baseline**

- Paper(Oral): https://arxiv.org/abs/2203.03831
- Code: https://github.com/nie-lang/DeepRectangling
- Dataset: https://github.com/nie-lang/DeepRectangling
- 中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q

**ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer**

- Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
- Paper: https://arxiv.org/abs/2204.02389
- Dataset: https://github.com/rhgao/ObjectFolder
- Demo：https://youtu.be/e5aToT3LkRA

**Shape from Polarization for Complex Scenes in the Wild**

- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild

**Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**

- Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
- Paper: https://arxiv.org/abs/2204.04120

**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**

- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC

**FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**

- Paper(Oral): https://arxiv.org/abs/2204.03646
- Dataset: https://github.com/xujinglin/FineDiving
- Code: https://github.com/xujinglin/FineDiving
- 中文解读：https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg

**Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**

- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout

**DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**

- Homepage: https://thudair.baai.ac.cn/index
- Paper: https://arxiv.org/abs/2204.05575
- Code: https://github.com/AIR-THU/DAIR-V2X

**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**

- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX

**Putting People in their Place: Monocular Regression of 3D People in Depth**

- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274

- Code:https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human

**UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**

- Paper: https://arxiv.org/abs/2111.08644
- Dataset: https://github.com/lilygeorgescu/UBnormal

**DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**

- Homepage: https://dancetrack.github.io
- Paper: https://arxiv.org/abs/2111.14690
- Dataset: https://github.com/DanceTrack/DanceTrack

**Visual Abductive Reasoning**

- Paper: https://arxiv.org/abs/2203.14040
- Code: https://github.com/leonnnop/VAR

**Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**

- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset

**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**

- Homepage: https://ithaca365.mae.cornell.edu/

- Paper: https://arxiv.org/abs/2208.01166

<a name="New-Tasks"></a>

# 新任务(New Task)

**Language-based Video Editing via Multi-Modal Multi-Level Transformer**

- Paper: https://arxiv.org/abs/2104.01122
- Code: None

**It's About Time: Analog Clock Reading in the Wild**

- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc

**Splicing ViT Features for Semantic Appearance Transfer**

- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice

**Visual Abductive Reasoning**

- Paper: https://arxiv.org/abs/2203.14040
- Code: https://github.com/leonnnop/VAR

<a name="Others"></a>

# 其他(Others)

**Kubric: A scalable dataset generator**

- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg

**X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**

- Paper: https://arxiv.org/abs/2203.00843
- Code: https://github.com/CurryYuan/X-Trans2Cap

**Balanced MSE for Imbalanced Visual Regression**

- Paper(Oral): https://arxiv.org/abs/2203.16427
- Code: https://github.com/jiawei-ren/BalancedMSE

**SNUG: Self-Supervised Neural Dynamic Garments**

- Homepage: http://mslab.es/projects/SNUG/
- Paper(Oral): https://arxiv.org/abs/2204.02219
- Code: https://github.com/isantesteban/snug

**Shape from Polarization for Complex Scenes in the Wild**

- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild

**LASER: LAtent SpacE Rendering for 2D Visual Localization**

- Paper(Oral): https://arxiv.org/abs/2204.00157
- Code: None

**Single-Photon Structured Light**

- Paper(Oral): https://arxiv.org/abs/2204.05300
- Code: None

**3DeformRS: Certifying Spatial Deformations on Point Clouds**

- Paper: https://arxiv.org/abs/2204.05687
- Code: None

**Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**

- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout

**Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes**

- Paper: https://arxiv.org/abs/2203.13412
- Code: https://github.com/zjsong/SSPL

**Robust and Accurate Superquadric Recovery: a Probabilistic Approach**

- Paper(Oral): https://arxiv.org/abs/2111.14517
- Code: https://github.com/bmlklwx/EMS-superquadric_fitting

**Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence**

- Paper: https://arxiv.org/abs/2203.00911
- Code: None

**Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**

- Paper(Oral): https://arxiv.org/abs/2204.08680
- Code: https://github.com/zengwang430521/TCFormer

**DeepDPM: Deep Clustering With an Unknown Number of Clusters**

- Paper: https://arxiv.org/abs/2203.14309
- Code: https://github.com/BGU-CS-VIL/DeepDPM

**ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic**

- Paper: https://arxiv.org/abs/2111.14447
- Code: https://github.com/YoadTew/zero-shot-image-to-text

**Proto2Proto: Can you recognize the car, the way I do?**

- Paper: https://arxiv.org/abs/2204.11830
- Code: https://github.com/archmaester/proto2proto

**Putting People in their Place: Monocular Regression of 3D People in Depth**

- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code:https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human

**Light Field Neural Rendering**

- Homepage: https://light-field-neural-rendering.github.io/
- Paper(Oral): https://arxiv.org/abs/2112.09687
- Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering

**Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**

- Paper: https://arxiv.org/abs/2204.06160
- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution

**Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning**

- Paper: https://arxiv.org/abs/2203.14333
- Code: https://github.com/0liliulei/LIIR  

================================================
FILE: CVPR2023-Papers-with-Code.md
================================================
# CVPR 2023 论文和开源项目合集(Papers with Code)

[CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)！

**25.78% = 2360 / 9155**

CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate.


> 注1：欢迎各位大佬提交issue，分享CVPR 2023论文和开源项目！
>
> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
>
> - [CVPR 2019](CVPR2019-Papers-with-Code.md)
> - [CVPR 2020](CVPR2020-Papers-with-Code.md)
> - [CVPR 2021](CVPR2021-Papers-with-Code.md)
> - [CVPR 2022](CVPR2022-Papers-with-Code.md)

如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ 

![](CVer学术交流群.png)

# 【CVPR 2023 论文开源目录】

- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [MAE](#MAE)
- [GAN](#GAN)
- [GNN](#GNN)
- [MLP](#MLP)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [Prompt](#Prompt)
- [Diffusion Models(扩散模型)](#Diffusion)
- [Avatars](#Avatars)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [异常检测(Anomaly Detection)](#AD)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [场景图生成(Scene Graph Generation)](#SGG)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)

<a name="Backbone"></a>

# Backbone

**Integrally Pre-Trained Transformer Pyramid Networks** 

- Paper: https://arxiv.org/abs/2211.12735
- Code: https://github.com/sunsmarterjie/iTPN

**Stitchable Neural Networks**

- Homepage: https://snnet.github.io/
- Paper: https://arxiv.org/abs/2302.06586
- Code: https://github.com/ziplab/SN-Net

**Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**

- Paper: https://arxiv.org/abs/2303.03667
- Code: https://github.com/JierunChen/FasterNet 

**BiFormer: Vision Transformer with Bi-Level Routing Attention**

- Paper: None
- Code: https://github.com/rayleizhu/BiFormer 

**DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network**

- Paper: https://arxiv.org/abs/2303.02165
- Code: https://github.com/alibaba/lightweight-neural-architecture-search 

**Vision Transformer with Super Token Sampling**

- Paper: https://arxiv.org/abs/2211.11167
- Code: https://github.com/hhb072/SViT

**Hard Patches Mining for Masked Image Modeling**

- Paper: None
- Code: None

**SMPConv: Self-moving Point Representations for Continuous Convolution**

- Paper: https://arxiv.org/abs/2304.02330
- Code: https://github.com/sangnekim/SMPConv

**Making Vision Transformers Efficient from A Token Sparsification View**

- Paper: https://arxiv.org/abs/2303.08685
- Code: https://github.com/changsn/STViT-R 

<a name="CLIP"></a>

# CLIP

**GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**

- Paper: https://arxiv.org/abs/2301.12959
- Code: https://github.com/tobran/GALIP

**DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**

- Paper: https://arxiv.org/abs/2303.06285
- Code: https://github.com/Yueming6568/DeltaEdit 

<a name="MAE"></a>

# MAE

**Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders** 

- Paper: https://arxiv.org/abs/2212.06785
- Code: https://github.com/ZrrSkywalker/I2P-MAE

**Generic-to-Specific Distillation of Masked Autoencoders**

- Paper: https://arxiv.org/abs/2302.14771
- Code: https://github.com/pengzhiliang/G2SD

<a name="GAN"></a>

# GAN

**DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**

- Paper: https://arxiv.org/abs/2303.06285
- Code: https://github.com/Yueming6568/DeltaEdit 

<a name="NeRF"></a>

# NeRF

**NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior**

- Home: https://nope-nerf.active.vision/
- Paper: https://arxiv.org/abs/2212.07388
- Code: None

**Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures**

- Paper: https://arxiv.org/abs/2211.07600
- Code: https://github.com/eladrich/latent-nerf

**NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis**

- Paper: https://arxiv.org/abs/2301.08556
- Code: None

**Panoptic Lifting for 3D Scene Understanding with Neural Fields**

- Homepage: https://nihalsid.github.io/panoptic-lifting/
- Paper: https://arxiv.org/abs/2212.09802
- Code: None

**NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer**

- Homepage: https://redrock303.github.io/nerflix/
- Paper: https://arxiv.org/abs/2303.06919 
- Code: None

**HNeRV: A Hybrid Neural Representation for Videos**

- Homepage: https://haochen-rye.github.io/HNeRV
- Paper: https://arxiv.org/abs/2304.02633
- Code: https://github.com/haochen-rye/HNeRV

<a name="DETR"></a>

# DETR

**DETRs with Hybrid Matching**

- Paper: https://arxiv.org/abs/2207.13080
- Code: https://github.com/HDETR

<a name="Prompt"></a>

# Prompt

**Diversity-Aware Meta Visual Prompting**

- Paper: https://arxiv.org/abs/2303.08138
- Code: https://github.com/shikiw/DAM-VP 

<a name="NAS"></a>

# NAS

**PA&DA: Jointly Sampling PAth and DAta for Consistent NAS**

- Paper: https://arxiv.org/abs/2302.14772
- Code: https://github.com/ShunLu91/PA-DA

<a name="Avatars"></a>

# Avatars

**Structured 3D Features for Reconstructing Relightable and Animatable Avatars**

- Homepage: https://enriccorona.github.io/s3f/
- Paper: https://arxiv.org/abs/2212.06820
- Code: None
- Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s

**Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos**

- Homepage: https://augmentedperception.github.io/monoavatar/
- Paper: https://arxiv.org/abs/2304.01436

<a name="ReID"></a>

# ReID(重识别)

**Clothing-Change Feature Augmentation for Person Re-Identification**

- Paper: None
- Code: None

**MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID**

- Paper: https://arxiv.org/abs/2303.07065
- Code: https://github.com/vimar-gu/MSINet

**Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification**

- Paper: https://arxiv.org/abs/2304.04205
- Code: None

**Large-scale Training Data Search for Object Re-identification**

- Paper: https://arxiv.org/abs/2303.16186
- Code: https://github.com/yorkeyao/SnP 

<a name="Diffusion"></a>

# Diffusion Models(扩散模型)

**Video Probabilistic Diffusion Models in Projected Latent Space** 

- Homepage: https://sihyun.me/PVDM/
- Paper: https://arxiv.org/abs/2302.07685
- Code: https://github.com/sihyun-yu/PVDM

**Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models**

- Paper: https://arxiv.org/abs/2211.10655
- Code: None

**Imagic: Text-Based Real Image Editing with Diffusion Models**

- Homepage: https://imagic-editing.github.io/
- Paper: https://arxiv.org/abs/2210.09276
- Code: None

**Parallel Diffusion Models of Operator and Image for Blind Inverse Problems**

- Paper: https://arxiv.org/abs/2211.10656
- Code: None

**DiffRF: Rendering-guided 3D Radiance Field Diffusion**

- Homepage: https://sirwyver.github.io/DiffRF/
- Paper: https://arxiv.org/abs/2212.01206
- Code: None

**MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**

- Paper: https://arxiv.org/abs/2212.09478
- Code: https://github.com/researchmm/MM-Diffusion

**HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising**

- Homepage: https://aminshabani.github.io/housediffusion/
- Paper: https://arxiv.org/abs/2211.13287
- Code: https://github.com/aminshabani/house_diffusion 

**TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets**

- Paper: https://arxiv.org/abs/2303.05762
- Code: https://github.com/chenweixin107/TrojDiff

**Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption**

- Paper: https://arxiv.org/abs/2207.03442
- Code: https://github.com/shiyegao/DDA 

**DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration**

- Paper: https://arxiv.org/abs/2303.06885
- Code: None

**Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**

- Homepage: https://nv-tlabs.github.io/trace-pace/
- Paper: https://arxiv.org/abs/2304.01893
- Code: None

**Generative Diffusion Prior for Unified Image Restoration and Enhancement**

- Paper: https://arxiv.org/abs/2304.01247
- Code: None

**Conditional Image-to-Video Generation with Latent Flow Diffusion Models**

- Paper: https://arxiv.org/abs/2303.13744
- Code: https://github.com/nihaomiao/CVPR23_LFDM 

<a name="Long-Tail"></a>

# 长尾分布(Long-Tail)

**Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation**

- Paper: https://arxiv.org/abs/2304.01279
- Code: None

<a name="Vision-Transformer"></a>

# Vision Transformer

**Integrally Pre-Trained Transformer Pyramid Networks** 

- Paper: https://arxiv.org/abs/2211.12735
- Code: https://github.com/sunsmarterjie/iTPN

**Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors**

- Homepage: https://niessnerlab.org/projects/hou2023mask3d.html
- Paper: https://arxiv.org/abs/2302.14746
- Code: None

**Learning Trajectory-Aware Transformer for Video Super-Resolution**

- Paper: https://arxiv.org/abs/2204.04216
- Code: https://github.com/researchmm/TTVSR

**Vision Transformers are Parameter-Efficient Audio-Visual Learners**

- Homepage: https://yanbo.ml/project_page/LAVISH/
- Code: https://github.com/GenjiB/LAVISH

**Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**

- Paper: https://arxiv.org/abs/2303.04249
- Code: None

**DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**

- Paper: https://arxiv.org/abs/2301.06051
- Code: https://github.com/Haiyang-W/DSVT

**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**

- Paper: https://arxiv.org/abs/2211.10772
- Code link: https://github.com/ViTAE-Transformer/DeepSolo

**BiFormer: Vision Transformer with Bi-Level Routing Attention**

- Paper: https://arxiv.org/abs/2303.08810
- Code: https://github.com/rayleizhu/BiFormer

**Vision Transformer with Super Token Sampling**

- Paper: https://arxiv.org/abs/2211.11167
- Code: https://github.com/hhb072/SViT

**BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision**

- Paper: https://arxiv.org/abs/2211.10439
- Code: None

**BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation**

- Paper: None
- Code: None

**Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention**

- Paper: https://arxiv.org/abs/2304.03282
- Code: None

**Making Vision Transformers Efficient from A Token Sparsification View**

- Paper: https://arxiv.org/abs/2303.08685
- Code: https://github.com/changsn/STViT-R 

<a name="VL"></a>

# 视觉和语言(Vision-Language)

**GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods**

- Paper: https://arxiv.org/abs/2301.01893
- Code: None

**Teaching Structured Vision&Language Concepts to Vision&Language Models**

- Paper: https://arxiv.org/abs/2211.11733
- Code: None

**Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks**

- Paper: https://arxiv.org/abs/2211.09808
- Code: https://github.com/fundamentalvision/Uni-Perceiver

**Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training**

- Paper: https://arxiv.org/abs/2303.00040
- Code: None

**CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**

- Paper: https://arxiv.org/abs/2303.02489
- Code: None

**FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**

- Paper: https://arxiv.org/abs/2303.02483
- Code: None

**Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding**

- Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html
- Paper: https://arxiv.org/abs/2303.04077
- Code: None

**All in One: Exploring Unified Video-Language Pre-training**

- Paper: https://arxiv.org/abs/2203.07303
- Code: https://github.com/showlab/all-in-one

**Position-guided Text Prompt for Vision Language Pre-training**

- Paper: https://arxiv.org/abs/2212.09737
- Code: https://github.com/sail-sg/ptp

**EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding**

- Paper: https://arxiv.org/abs/2209.14941
- Code: https://github.com/yanmin-wu/EDA

**CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**

- Paper: https://arxiv.org/abs/2303.02489
- Code: None

**FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**

- Paper: https://arxiv.org/abs/2303.02483
- Code: https://github.com/BrandonHanx/FAME-ViL

**Align and Attend: Multimodal Summarization with Dual Contrastive Losses**

- Homepage: https://boheumd.github.io/A2Summ/
- Paper: https://arxiv.org/abs/2303.07284
- Code: https://github.com/boheumd/A2Summ

**Multi-Modal Representation Learning with Text-Driven Soft Masks**

- Paper: https://arxiv.org/abs/2304.00719
- Code: None

**Learning to Name Classes for Vision and Language Models**

- Paper: https://arxiv.org/abs/2304.01830
- Code: None

<a name="Object-Detection"></a>

# 目标检测(Object Detection)

**YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors**

- Paper: https://arxiv.org/abs/2207.02696
- Code: https://github.com/WongKinYiu/yolov7

**DETRs with Hybrid Matching**

- Paper: https://arxiv.org/abs/2207.13080
- Code: https://github.com/HDETR

**Enhanced Training of Query-Based Object Detection via Selective Query Recollection**

- Paper: https://arxiv.org/abs/2212.07593
- Code: https://github.com/Fangyi-Chen/SQR

**Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection**

- Paper: https://arxiv.org/abs/2303.05892
- Code: https://github.com/LutingWang/OADP

<a name="VT"></a>

# 目标跟踪(Object Tracking)

**Simple Cues Lead to a Strong Multi-Object Tracker**

- Paper: https://arxiv.org/abs/2206.04656
- Code: None

**Joint Visual Grounding and Tracking with Natural Language Specification**

- Paper: https://arxiv.org/abs/2303.12027
- Code: https://github.com/lizhou-cs/JointNLT 

<a name="Semantic-Segmentation"></a>

# 语义分割(Semantic Segmentation)

**Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos**

- Paper: https://arxiv.org/abs/2303.07224
- Code: https://github.com/THU-LYJ-Lab/AR-Seg

**FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding**

- Paper: https://arxiv.org/abs/2304.02135
- Code: https://github.com/uark-cviu/FREDOM

<a name="MIS"></a>

# 医学图像分割(Medical Image Segmentation)

**Label-Free Liver Tumor Segmentation**

- Paper: https://arxiv.org/abs/2303.14869
- Code: https://github.com/MrGiovanni/SyntheticTumors

**Directional Connectivity-based Segmentation of Medical Images**

- Paper: https://arxiv.org/abs/2304.00145
- Code: https://github.com/Zyun-Y/DconnNet

**Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation**

- Paper: https://arxiv.org/abs/2305.00673
- Code: https://github.com/DeepMed-Lab-ECNU/BCP

**Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization**

- Paper: https://arxiv.org/abs/2304.00212
- Code: None

**Fair Federated Medical Image Segmentation via Client Contribution Estimation**

- Paper: https://arxiv.org/abs/2303.16520
- Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce

**Ambiguous Medical Image Segmentation using Diffusion Models**

- Homepage: https://aimansnigdha.github.io/cimd/
- Paper: https://arxiv.org/abs/2304.04745
- Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models

**Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation**

- Paper: https://arxiv.org/abs/2303.13090
- Code: https://github.com/HengCai-NJU/DeSCO

**MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery**

- Paper: https://arxiv.org/abs/2301.01767
- Code: https://github.com/DeepMed-Lab-ECNU/MagicNet

**MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation**

- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
- Code: https://github.com/WYC-321/MCF

**Rethinking Few-Shot Medical Segmentation: A Vector Quantization View**

- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html
- Code: None

**Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation**

- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
- Code: https://github.com/hritam-98/PatchCL-MedSeg

**SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation**

- Paper: https://arxiv.org/abs/2305.11012
- Code: None

**DoNet: Deep De-overlapping Network for Cytology Instance Segmentation**

- Paper: https://arxiv.org/abs/2303.14373
- Code: https://github.com/DeepDoNet/DoNet

<a name="VOS"></a>

# 视频目标分割（Video Object Segmentation）

**Two-shot Video Object Segmentation**

- Paper: https://arxiv.org/abs/2303.12078
- Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation

 **Under Video Object Segmentation Section**

- Paper: https://arxiv.org/abs/2303.07815
- Code: None

<a name="VIS"></a>

# 视频实例分割(Video Instance Segmentation)

**Mask-Free Video Instance Segmentation**

- Paper: https://arxiv.org/abs/2303.15904
- Code: https://github.com/SysCV/MaskFreeVis 

<a name="RIS"></a>

# 参考图像分割(Referring Image Segmentation )

**PolyFormer: Referring Image Segmentation as Sequential Polygon Generation**

- Paper: https://arxiv.org/abs/2302.07387 

- Code: None

<a name="3D-Point-Cloud"></a>

# 3D点云(3D-Point-Cloud)

**Physical-World Optical Adversarial Attacks on 3D Face Recognition**

- Paper: https://arxiv.org/abs/2205.13412
- Code: https://github.com/PolyLiYJ/SLAttack.git

**IterativePFN: True Iterative Point Cloud Filtering**

- Paper: https://arxiv.org/abs/2304.01529
- Code: https://github.com/ddsediri/IterativePFN

**Attention-based Point Cloud Edge Sampling**

- Homepage: https://junweizheng93.github.io/publications/APES/APES.html 
- Paper: https://arxiv.org/abs/2302.14673
- Code: https://github.com/JunweiZheng93/APES

<a name="3DOD"></a>

# 3D目标检测(3D Object Detection)

**DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**

- Paper: https://arxiv.org/abs/2301.06051
- Code: https://github.com/Haiyang-W/DSVT 

**FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection**

- Paper:  https://arxiv.org/abs/2301.04467
- Code: None

**3D Video Object Detection with Learnable Object-Centric Global Optimization**

- Paper: None
- Code: None

**Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection**

- Paper: https://arxiv.org/abs/2304.01464
- Code: https://github.com/azhuantou/HSSDA

<a name="3DOD"></a>

# 3D语义分割(3D Semantic Segmentation)

**Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation**

- Paper: https://arxiv.org/abs/2303.11203
- Code: https://github.com/l1997i/lim3d 

<a name="3DSSC"></a>

# 3D语义场景补全(3D Semantic Scene Completion)

- Paper: https://arxiv.org/abs/2302.12251
- Code: https://github.com/NVlabs/VoxFormer 

<a name="3D-Registration"></a>

# 3D配准(3D Registration)

**Robust Outlier Rejection for 3D Registration with Variational Bayes**

- Paper: https://arxiv.org/abs/2304.01514
- Code: https://github.com/Jiang-HB/VBReg

<a name="3D-Human-Pose-Estimation"></a>

# 3D人体姿态估计(3D Human Pose Estimation)

<a name="3D-Human-Mesh-Estimation"></a>

# 3D人体Mesh估计(3D Human Mesh Estimation)

**3D Human Mesh Estimation from Virtual Markers**

- Paper: https://arxiv.org/abs/2303.11726
- Code: https://github.com/ShirleyMaxx/VirtualMarker 

<a name="LLV"></a>

# Low-level Vision

**Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective**

- Paper: https://arxiv.org/abs/2303.06859
- Code: https://github.com/lixinustc/Casual-IR-DIL 

**Burstormer: Burst Image Restoration and Enhancement Transformer**

- Paper: https://arxiv.org/abs/2304.01194
- Code: http://github.com/akshaydudhane16/Burstormer

<a name="SR"></a>

# 超分辨率(Video Super-Resolution)

**Super-Resolution Neural Operator**

- Paper: https://arxiv.org/abs/2303.02584
- Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator 

## 视频超分辨率

**Learning Trajectory-Aware Transformer for Video Super-Resolution**

- Paper: https://arxiv.org/abs/2204.04216

- Code: https://github.com/researchmm/TTVSR

Denoising<a name="Denoising"></a>

# 去噪(Denoising)

## 图像去噪(Image Denoising)

**Masked Image Training for Generalizable Deep Image Denoising**

- Paper- : https://arxiv.org/abs/2303.13132
- Code: https://github.com/haoyuc/MaskedDenoising 

<a name="Image-Generation"></a>

# 图像生成(Image Generation)

**GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**

- Paper: https://arxiv.org/abs/2301.12959
- Code: https://github.com/tobran/GALIP 

**MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis**

- Paper: https://arxiv.org/abs/2211.09117
- Code: https://github.com/LTH14/mage

**Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation**

- Paper: https://arxiv.org/abs/2304.01816
- Code: None

**Few-shot Semantic Image Synthesis with Class Affinity Transfer**

- Paper: https://arxiv.org/abs/2304.02321
- Code: None

**TopNet: Transformer-based Object Placement Network for Image Compositing**

- Paper: https://arxiv.org/abs/2304.03372
- Code: None

<a name="Video-Generation"></a>

# 视频生成(Video Generation)

**MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**

- Paper: https://arxiv.org/abs/2212.09478
- Code: https://github.com/researchmm/MM-Diffusion

**Conditional Image-to-Video Generation with Latent Flow Diffusion Models**

- Paper: https://arxiv.org/abs/2303.13744
- Code: https://github.com/nihaomiao/CVPR23_LFDM 

<a name="Video-Understanding"></a>

# 视频理解(Video Understanding)

**Learning Transferable Spatiotemporal Representations from Natural Script Knowledge**

- Paper: https://arxiv.org/abs/2209.15280
- Code: https://github.com/TencentARC/TVTS

**Frame Flexible Network**

- Paper: https://arxiv.org/abs/2303.14817
- Code: https://github.com/BeSpontaneous/FFN

**Masked Motion Encoding for Self-Supervised Video Representation Learning**

- Paper: https://arxiv.org/abs/2210.06096
- Code: https://github.com/XinyuSun/MME

**MARLIN: Masked Autoencoder for facial video Representation LearnING**

- Paper: https://arxiv.org/abs/2211.06627
- Code: https://github.com/ControlNet/MARLIN 

<a name="Action-Detection"></a>

# 行为检测(Action Detection)

**TriDet: Temporal Action Detection with Relative Boundary Modeling**

- Paper: https://arxiv.org/abs/2303.07347
- Code: https://github.com/dingfengshi/TriDet 

<a name="Text-Detection"></a>

# 文本检测(Text Detection)

**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**

- Paper: https://arxiv.org/abs/2211.10772
- Code link: https://github.com/ViTAE-Transformer/DeepSolo

<a name="KD"></a>

# 知识蒸馏(Knowledge Distillation)

**Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation**

- Paper: https://arxiv.org/abs/2302.14290
- Code: None

**Generic-to-Specific Distillation of Masked Autoencoders**

- Paper: https://arxiv.org/abs/2302.14771
- Code: https://github.com/pengzhiliang/G2SD

<a name="Pruning"></a>

# 模型剪枝(Model Pruning)

**DepGraph: Towards Any Structural Pruning**

- Paper: https://arxiv.org/abs/2301.12900
- Code: https://github.com/VainF/Torch-Pruning 

<a name="IC"></a>

# 图像压缩(Image Compression)

**Context-Based Trit-Plane Coding for Progressive Image Compression**

- Paper: https://arxiv.org/abs/2303.05715
- Code: https://github.com/seungminjeon-github/CTC

<a name="AD"></a>

# 异常检测(Anomaly Detection)

**Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images**

- Paper: https://arxiv.org/abs/2111.13495
- Code: https://github.com/tiangexiang/SQUID 

<a name="3D-Reconstruction"></a>

# 三维重建(3D Reconstruction)

**OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields**

- Paper: https://arxiv.org/abs/2211.12886
- Code: None

**SparsePose: Sparse-View Camera Pose Regression and Refinement**

- Paper: https://arxiv.org/abs/2211.16991
- Code: None

**NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction**

- Paper: https://arxiv.org/abs/2303.02375
- Code: None

**Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition**

- Homepage: https://moygcc.github.io/vid2avatar/
- Paper: https://arxiv.org/abs/2302.11566
- Code: https://github.com/MoyGcc/vid2avatar
- Demo: https://youtu.be/EGi47YeIeGQ

**To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**

- Paper: https://arxiv.org/abs/2106.09614
- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA

**Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction**

- Paper: https://arxiv.org/abs/2303.05937
- Code: None

**3D Cinemagraphy from a Single Image**

- Homepage: https://xingyi-li.github.io/3d-cinemagraphy/
- Paper: https://arxiv.org/abs/2303.05724
- Code: https://github.com/xingyi-li/3d-cinemagraphy

**Revisiting Rotation Averaging: Uncertainties and Robust Losses**

- Paper: https://arxiv.org/abs/2303.05195
- Code https://github.com/zhangganlin/GlobalSfMpy 

**FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction**

- Paper: https://arxiv.org/abs/2211.13874
- Code: https://github.com/csbhr/FFHQ-UV 

**A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images**

- Homepage: https://younglbw.github.io/HRN-homepage/ 

- Paper: https://arxiv.org/abs/2302.14434
- Code: https://github.com/youngLBW/HRN

<a name="Depth-Estimation"></a>

# 深度估计(Depth Estimation)

**Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation**

- Paper: https://arxiv.org/abs/2211.13202
- Code: https://github.com/noahzn/Lite-Mono 

<a name="TP"></a>

# 轨迹预测(Trajectory Prediction)

**IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction**

- Paper:  https://arxiv.org/abs/2303.00575
- Code: None

**EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning**

- Paper: https://arxiv.org/abs/2303.10876
- Code: https://github.com/MediaBrain-SJTU/EqMotion 

<a name="Lane-Detection"></a>

# 车道线检测(Lane Detection)

**Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection**

- Paper: https://arxiv.org/abs/2301.02371
- Code: https://github.com/tusen-ai/Anchor3DLane

**BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points**

- Paper:  https://arxiv.org/abs/2210.06006v3 
- Code:  https://github.com/gigo-team/bev_lane_det 

<a name="Image-Captioning"></a>

# 图像描述(Image Captioning)

**ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing**

- Paper: https://arxiv.org/abs/2303.02437
- Code: Node

**Cross-Domain Image Captioning with Discriminative Finetuning**

- Paper: https://arxiv.org/abs/2304.01662
- Code: None

**Model-Agnostic Gender Debiased Image Captioning**

- Paper: https://arxiv.org/abs/2304.03693
- Code: None

<a name="VQA"></a>

# 视觉问答(Visual Question Answering)

**MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering**

- Paper:  https://arxiv.org/abs/2303.01239
- Code: https://github.com/jingjing12110/MixPHM

<a name="SLR"></a>

# 手语识别(Sign Language Recognition)

**Continuous Sign Language Recognition with Correlation Network**

Paper: https://arxiv.org/abs/2303.03202

Code: https://github.com/hulianyuyy/CorrNet

<a name="Video-Prediction"></a>

# 视频预测(Video Prediction)

**MOSO: Decomposing MOtion, Scene and Object for Video Prediction**

- Paper: https://arxiv.org/abs/2303.03684
- Code: https://github.com/anonymous202203/MOSO

<a name="NVS"></a>

# 新视点合成(Novel View Synthesis)

 **3D Video Loops from Asynchronous Input**

- Homepage: https://limacv.github.io/VideoLoop3D_web/
- Paper: https://arxiv.org/abs/2303.05312
- Code: https://github.com/limacv/VideoLoop3D 

<a name="ZSL"></a>

# Zero-Shot Learning(零样本学习)

**Bi-directional Distribution Alignment for Transductive Zero-Shot Learning**

- Paper: https://arxiv.org/abs/2303.08698
- Code: https://github.com/Zhicaiwww/Bi-VAEGAN

**Semantic Prompt for Few-Shot Learning**

- Paper: None
- Code: None

<a name="Stereo-Matching"></a>

# 立体匹配(Stereo Matching)

**Iterative Geometry Encoding Volume for Stereo Matching**

- Paper: https://arxiv.org/abs/2303.06615
- Code: https://github.com/gangweiX/IGEV

**Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation**

- Paper: https://arxiv.org/abs/2304.00152
- Code: None

<a name="Feature-Matching"></a>

# 特征匹配(Feature Matching)

**Adaptive Spot-Guided Transformer for Consistent Local Feature Matching**

- Homepage: [https://astr2023.github.io](https://astr2023.github.io/) 
- Paper: https://arxiv.org/abs/2303.16624
- Code: https://github.com/ASTR2023/ASTR

<a name="SGG"></a>

# 场景图生成(Scene Graph Generation)

**Prototype-based Embedding Network for Scene Graph Generation**

- Paper: https://arxiv.org/abs/2303.07096
- Code: None

<a name="INR"></a>

# 隐式神经表示(Implicit Neural Representations)

**Polynomial Implicit Neural Representations For Large Diverse Datasets**

- Paper: https://arxiv.org/abs/2303.11424
- Code: https://github.com/Rajhans0/Poly_INR

<a name="IQA"></a>

# 图像质量评价(Image Quality Assessment)

**Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild**

- Paper: https://arxiv.org/abs/2304.00451
- Code: None

<a name="Datasets"></a>

# 数据集(Datasets)

**Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes**

- Paper: https://arxiv.org/abs/2303.02760
- Code: None

**Align and Attend: Multimodal Summarization with Dual Contrastive Losses**

- Homepage: https://boheumd.github.io/A2Summ/
- Paper: https://arxiv.org/abs/2303.07284
- Code: https://github.com/boheumd/A2Summ

**GeoNet: Benchmarking Unsupervised Adaptation across Geographies**

- Homepage: https://tarun005.github.io/GeoNet/
- Paper: https://arxiv.org/abs/2303.15443

**CelebV-Text: A Large-Scale Facial Text-Video Dataset**

- Homepage: https://celebv-text.github.io/
- Paper: https://arxiv.org/abs/2303.14717

<a name="Others"></a>

# 其他(Others)

**Interactive Segmentation as Gaussian Process Classification**

- Paper: https://arxiv.org/abs/2302.14578
- Code: None

**Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger**

- Paper: https://arxiv.org/abs/2302.14677
- Code: None

**SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries**

- Homepage: http://bit.ly/splinecam
- Paper: https://arxiv.org/abs/2302.12828
- Code: None

**SCOTCH and SODA: A Transformer Video Shadow Detection Framework**

- Paper: https://arxiv.org/abs/2211.06885
- Code: None

**DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization**

- Homepage: https://ai4ce.github.io/DeepMapping2/
- Paper: https://arxiv.org/abs/2212.06331
- None: https://github.com/ai4ce/DeepMapping2

**RelightableHands: Efficient Neural Relighting of Articulated Hand Models**

- Homepage: https://sh8.io/#/relightable_hands
- Paper: https://arxiv.org/abs/2302.04866
- Code: None

**Token Turing Machines**

- Paper: https://arxiv.org/abs/2211.09119
- Code: None

**Single Image Backdoor Inversion via Robust Smoothed Classifiers**

- Paper: https://arxiv.org/abs/2303.00215
- Code: https://github.com/locuslab/smoothinv

**To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**

- Paper: https://arxiv.org/abs/2106.09614
- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA

**HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics**

- Homepage: https://dolorousrtur.github.io/hood/
- Paper: https://arxiv.org/abs/2212.07242
- Code: https://github.com/dolorousrtur/hood
- Demo: https://www.youtube.com/watch?v=cBttMDPrUYY

**A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others**

- Paper: https://arxiv.org/abs/2212.04825
- Code: https://github.com/facebookresearch/Whac-A-Mole.git

**RelightableHands: Efficient Neural Relighting of Articulated Hand Models**

- Homepage: https://sh8.io/#/relightable_hands
- Paper: https://arxiv.org/abs/2302.04866
- Code: None
- Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4

**Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation**

- Paper: https://arxiv.org/abs/2303.00914
- Code: None

**Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression**

- Paper: https://arxiv.org/abs/2303.01052
- Code: None

**UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy**

- Paper: https://arxiv.org/abs/2303.00938
- Code: None

**Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness**

- Paper: https://arxiv.org/abs/2303.00971
- Code: https://github.com/zhijieshen-bjtu/DOPNet

**Learning Neural Parametric Head Models**

- Homepage: https://simongiebenhain.github.io/NPHM)
- Paper: https://arxiv.org/abs/2212.02761
- Code: None

**A Meta-Learning Approach to Predicting Performance and Data Requirements**

- Paper: https://arxiv.org/abs/2303.01598
- Code: None

**MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision**

- Homepage: https://imagine.enpc.fr/~guedona/MACARONS/
- Paper: https://arxiv.org/abs/2303.03315
- Code: None

**Masked Images Are Counterfactual Samples for Robust Fine-tuning**

- Paper: https://arxiv.org/abs/2303.03052
- Code: None

**HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling**

- Paper: https://arxiv.org/abs/2303.02700
- Code: None

**Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization**

- Paper: https://arxiv.org/abs/2303.02328
- Code: None

**Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization**

- Paper: https://arxiv.org/abs/2303.03108
- Code: None

**Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples**

- Paper: https://arxiv.org/abs/2301.01217
- Code: https://github.com/jiamingzhang94/Unlearnable-Clusters 

**Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**

- Paper: https://arxiv.org/abs/2303.04249
- Code: None

**UniHCP: A Unified Model for Human-Centric Perceptions**

- Paper: https://arxiv.org/abs/2303.02936
- Code: https://github.com/OpenGVLab/UniHCP

**CUDA: Convolution-based Unlearnable Datasets**

- Paper: https://arxiv.org/abs/2303.04278
- Code: https://github.com/vinusankars/Convolution-based-Unlearnability

**Masked Images Are Counterfactual Samples for Robust Fine-tuning**

- Paper: https://arxiv.org/abs/2303.03052
- Code: None

**AdaptiveMix: Robust Feature Representation via Shrinking Feature Space**

- Paper: https://arxiv.org/abs/2303.01559
- Code: https://github.com/WentianZhang-ML/AdaptiveMix 

**Physical-World Optical Adversarial Attacks on 3D Face Recognition**

- Paper: https://arxiv.org/abs/2205.13412
- Code: https://github.com/PolyLiYJ/SLAttack.git

**DPE: Disentanglement of Pose and Expression for General Video Portrait Editing**

- Paper: https://arxiv.org/abs/2301.06281
- Code: https://carlyx.github.io/DPE/ 

**SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation**

- Paper: https://arxiv.org/abs/2211.12194
- Code: https://github.com/Winfredy/SadTalker

**Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models**

- Paper: None
- Code: None

**Sharpness-Aware Gradient Matching for Domain Generalization**

- Paper: None
- Code: https://github.com/Wang-pengfei/SAGM

**Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization**

- Paper: None
- Code: None

**Blind Video Deflickering by Neural Filtering with a Flawed Atlas**

- Homepage:  https://chenyanglei.github.io/deflicker 
- Paper: None
- Code: None

**RiDDLE: Reversible and Diversified De-identification with Latent Encryptor**

- Paper: None
- Code:  https://github.com/ldz666666/RiDDLE 

**PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation**

- Paper: https://arxiv.org/abs/2303.07337
- Code: None

**Upcycling Models under Domain and Category Shift**

- Paper: https://arxiv.org/abs/2303.07110
- Code: https://github.com/ispc-lab/GLC

**Modality-Agnostic Debiasing for Single Domain Generalization**

- Paper: https://arxiv.org/abs/2303.07123
- Code: None

**Progressive Open Space Expansion for Open-Set Model Attribution**

- Paper: https://arxiv.org/abs/2303.06877
- Code: None

**Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies**

- Paper: https://arxiv.org/abs/2303.06856
- Code: None

**GFPose: Learning 3D Human Pose Prior with Gradient Fields**

- Paper: https://arxiv.org/abs/2212.08641
- Code: https://github.com/Embracing/GFPose 

**PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment**

- Paper: https://arxiv.org/abs/2303.11526
- Code: https://github.com/Zhang-VISLab

**Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings**

- Paper: https://arxiv.org/abs/2303.11502
- Code: None

**Boundary Unlearning**

- Paper: https://arxiv.org/abs/2303.11570
- Code: None

**ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing**

- Paper: https://arxiv.org/abs/2303.17096
- Code: https://github.com/alibaba/easyrobust

**Zero-shot Model Diagnosis**

- Paper: https://arxiv.org/abs/2303.15441
- Code: None

**GeoNet: Benchmarking Unsupervised Adaptation across Geographies**

- Homepage: https://tarun005.github.io/GeoNet/
- Paper: https://arxiv.org/abs/2303.15443

**Quantum Multi-Model Fitting**

- Paper: https://arxiv.org/abs/2303.15444
- Code: https://github.com/FarinaMatteo/qmmf

**DivClust: Controlling Diversity in Deep Clustering**

- Paper: https://arxiv.org/abs/2304.01042
- Code: None

**Neural Volumetric Memory for Visual Locomotion Control**

- Homepage: https://rchalyang.github.io/NVM
- Paper: https://arxiv.org/abs/2304.01201
- Code: https://rchalyang.github.io/NVM

**MonoHuman: Animatable Human Neural Field from Monocular Video**

- Homepage: https://yzmblog.github.io/projects/MonoHuman/
- Paper: https://arxiv.org/abs/2304.02001
- Code: https://github.com/Yzmblog/MonoHuman

**Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**

- Homepage: https://nv-tlabs.github.io/trace-pace/
- Paper: https://arxiv.org/abs/2304.01893
- Code: None

**Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification**

- Paper: https://arxiv.org/abs/2304.01804
- Code: None

**HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering**

- Paper: https://arxiv.org/abs/2304.01686
- Code: None

**On the Stability-Plasticity Dilemma of Class-Incremental Learning**

- Paper: https://arxiv.org/abs/2304.01663
- Code: None

**Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning**

- Paper: https://arxiv.org/abs/2304.01482
- Code: None

**VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution**

- Paper: https://arxiv.org/abs/2304.01434
- Code: https://github.com/jaeill/CVPR23-VNE

**Detecting and Grounding Multi-Modal Media Manipulation**

- Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake
- Paper: https://arxiv.org/abs/2304.02556
- Code: https://github.com/rshaojimmy/MultiModal-DeepFake

**Meta-causal Learning for Single Domain Generalization**

- Paper: https://arxiv.org/abs/2304.03709
- Code: None

**Disentangling Writer and Character Styles for Handwriting Generation**

- Paper: https://arxiv.org/abs/2303.14736
- Code: https://github.com/dailenson/SDT

**DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects**

- Homepage: https://www.chenbao.tech/dexart/

- Code: https://github.com/Kami-code/dexart-release

**Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision**

- Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html 
- Paper: https://arxiv.org/abs/2303.00462
- Code: https://github.com/Toytiny/CMFlow

**Marching-Primitives: Shape Abstraction from Signed Distance Function**

- Paper: https://arxiv.org/abs/2303.13190
- Code: https://github.com/ChirikjianLab/Marching-Primitives

**Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision**

- Paper: https://arxiv.org/abs/2303.00885
- Code: None

================================================
FILE: CVPR2024-Papers-with-Code.md
================================================
# CVPR 2024 论文和开源项目合集(Papers with Code)

CVPR 2024 decisions are now available on OpenReview！


> 注1：欢迎各位大佬提交issue，分享CVPR 2024论文和开源项目！
>
> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
>
> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)
> - [CVPR 2023](CVPR2022-Papers-with-Code.md)

欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！

![](CVer学术交流群.png)

# 【CVPR 2024 论文开源目录】

- [3DGS(Gaussian Splatting)](#3DGS)
- [Avatars](#Avatars)
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [MAE](#MAE)
- [Embodied AI](#Embodied-AI)
- [GAN](#GAN)
- [GNN](#GNN)
- [多模态大语言模型(MLLM)](#MLLM)
- [大语言模型(LLM)](#LLM)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [Prompt](#Prompt)
- [扩散模型(Diffusion Models)](#Diffusion)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像(Medical Image)](#MI)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [3D生成(3D Generation)](#3D-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [场景图生成(Scene Graph Generation)](#SGG)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)

<a name="3DGS"></a>

# 3DGS(Gaussian Splatting)

**Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering**

- Homepage: https://city-super.github.io/scaffold-gs/
- Paper: https://arxiv.org/abs/2312.00109
- Code: https://github.com/city-super/Scaffold-GS

**GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis**

- Homepage: https://shunyuanzheng.github.io/GPS-Gaussian 
- Paper: https://arxiv.org/abs/2312.02155
- Code: https://github.com/ShunyuanZheng/GPS-Gaussian

**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**

- Paper: https://arxiv.org/abs/2312.02134
- Code: https://github.com/huliangxiao/GaussianAvatar

**GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting**

- Paper: https://arxiv.org/abs/2311.14521
- Code: https://github.com/buaacyw/GaussianEditor 

**Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction**

- Homepage: https://ingra14m.github.io/Deformable-Gaussians/ 
- Paper: https://arxiv.org/abs/2309.13101
- Code: https://github.com/ingra14m/Deformable-3D-Gaussians

**SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes**

- Homepage: https://yihua7.github.io/SC-GS-web/ 
- Paper: https://arxiv.org/abs/2312.14937
- Code: https://github.com/yihua7/SC-GS

**Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis**

- Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/ 
- Paper: https://arxiv.org/abs/2312.16812
- Code: https://github.com/oppo-us-research/SpacetimeGaussians

**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**

- Homepage: https://fictionarry.github.io/DNGaussian/
- Paper: https://arxiv.org/abs/2403.06912
- Code: https://github.com/Fictionarry/DNGaussian

**4D Gaussian Splatting for Real-Time Dynamic Scene Rendering**

- Paper: https://arxiv.org/abs/2310.08528
- Code: https://github.com/hustvl/4DGaussians

**GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models**

- Paper: https://arxiv.org/abs/2310.08529
- Code: https://github.com/hustvl/GaussianDreamer

<a name="Avatars"></a>

# Avatars

**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**

- Paper: https://arxiv.org/abs/2312.02134
- Code: https://github.com/huliangxiao/GaussianAvatar

**Real-Time Simulated Avatar from Head-Mounted Sensors**

- Homepage: https://www.zhengyiluo.com/SimXR/
- Paper: https://arxiv.org/abs/2403.06862

<a name="Backbone"></a>

# Backbone

**RepViT: Revisiting Mobile CNN From ViT Perspective**

- Paper: https://arxiv.org/abs/2307.09283
- Code: https://github.com/THU-MIG/RepViT

**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**

- Paper: https://arxiv.org/abs/2311.17132
- Code: https://github.com/DaiShiResearch/TransNeXt

<a name="CLIP"></a>

# CLIP

**Alpha-CLIP: A CLIP Model Focusing on Wherever You Want**

- Paper: https://arxiv.org/abs/2312.03818
- Code: https://github.com/SunzeY/AlphaCLIP

**FairCLIP: Harnessing Fairness in Vision-Language Learning**

- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP

<a name="MAE"></a>

# MAE

<a name="Embodied-AI"></a>

# Embodied AI

**EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI**

- Homepage: https://tai-wang.github.io/embodiedscan/
- Paper: https://arxiv.org/abs/2312.16170
- Code: https://github.com/OpenRobotLab/EmbodiedScan

**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**

- Homepage: https://iranqin.github.io/MP5.github.io/ 
- Paper: https://arxiv.org/abs/2312.07472
- Code: https://github.com/IranQin/MP5

**LEMON: Learning 3D Human-Object Interaction Relation from 2D Images**

- Paper: https://arxiv.org/abs/2312.08963
- Code: https://github.com/yyvhang/lemon_3d 

<a name="GAN"></a>

# GAN

<a name="OCR"></a>

# OCR

**An Empirical Study of Scaling Law for OCR**

- Paper: https://arxiv.org/abs/2401.00028
- Code: https://github.com/large-ocr-model/large-ocr-model.github.io

**ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting**

- Paper: https://arxiv.org/abs/2403.00303
- Code: https://github.com/PriNing/ODM 

<a name="NeRF"></a>

# NeRF

**PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF**

- Paper: https://arxiv.org/abs/2311.13099
- Code: https://github.com/FYTalon/pienerf/ 

<a name="DETR"></a>

# DETR

**DETRs Beat YOLOs on Real-time Object Detection**

- Paper: https://arxiv.org/abs/2304.08069
- Code: https://github.com/lyuwenyu/RT-DETR

**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**

- Paper: https://arxiv.org/abs/2403.16131
- Code: https://github.com/xiuqhou/Salience-DETR

<a name="Prompt"></a>

# Prompt

<a name="MLLM"></a>

# 多模态大语言模型(MLLM)

**mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration**

- Paper: https://arxiv.org/abs/2311.04257
- Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2

**Link-Context Learning for Multimodal LLMs**

- Paper: https://arxiv.org/abs/2308.07891
- Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main 

**OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation**

- Paper: https://arxiv.org/abs/2311.17911
- Code: https://github.com/shikiw/OPERA

**Making Large Multimodal Models Understand Arbitrary Visual Prompts**

- Homepage: https://vip-llava.github.io/ 
- Paper: https://arxiv.org/abs/2312.00784

**Pink: Unveiling the power of referential comprehension for multi-modal llms**

- Paper: https://arxiv.org/abs/2310.00582
- Code: https://github.com/SY-Xuan/Pink

**Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding**

- Paper: https://arxiv.org/abs/2311.08046
- Code: https://github.com/PKU-YuanGroup/Chat-UniVi

**OneLLM: One Framework to Align All Modalities with Language**

- Paper: https://arxiv.org/abs/2312.03700
- Code: https://github.com/csuhan/OneLLM

<a name="LLM"></a>

# 大语言模型(LLM)

**VTimeLLM: Empower LLM to Grasp Video Moments**

- Paper: https://arxiv.org/abs/2311.18445
- Code: https://github.com/huangb23/VTimeLLM 

<a name="NAS"></a>

# NAS

<a name="ReID"></a>

# ReID(重识别)

**Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification**

- Paper: https://arxiv.org/abs/2403.10254
- Code: https://github.com/924973292/EDITOR 

**Noisy-Correspondence Learning for Text-to-Image Person Re-identification**

- Paper: https://arxiv.org/abs/2308.09911

- Code : https://github.com/QinYang79/RDE 

<a name="Diffusion"></a>

# 扩散模型(Diffusion Models)

**InstanceDiffusion: Instance-level Control for Image Generation**

- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/

- Paper: https://arxiv.org/abs/2402.03290
- Code: https://github.com/frank-xwang/InstanceDiffusion

**Residual Denoising Diffusion Models**

- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM

**DeepCache: Accelerating Diffusion Models for Free**

- Paper: https://arxiv.org/abs/2312.00858
- Code: https://github.com/horseee/DeepCache

**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**

- Homepage: https://tianhao-qi.github.io/DEADiff/ 

- Paper: https://arxiv.org/abs/2403.06951
- Code: https://github.com/Tianhao-Qi/DEADiff_code

**SVGDreamer: Text Guided SVG Generation with Diffusion Model**

- Paper: https://arxiv.org/abs/2312.16476
- Code: https://ximinng.github.io/SVGDreamer-project/

**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**

- Paper: https://arxiv.org/abs/2312.05849
- Code: https://github.com/jiuntian/interactdiffusion

**MMA-Diffusion: MultiModal Attack on Diffusion Models**

- Paper: https://arxiv.org/abs/2311.17516
- Code: https://github.com/yangyijune/MMA-Diffusion

**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**

- Homeoage: https://video-motion-customization.github.io/ 
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization

<a name="Vision-Transformer"></a>

# Vision Transformer

**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**

- Paper: https://arxiv.org/abs/2311.17132
- Code: https://github.com/DaiShiResearch/TransNeXt

**RepViT: Revisiting Mobile CNN From ViT Perspective**

- Paper: https://arxiv.org/abs/2307.09283
- Code: https://github.com/THU-MIG/RepViT

**A General and Efficient Training for Transformer via Token Expansion**

- Paper: https://arxiv.org/abs/2404.00672
- Code: https://github.com/Osilly/TokenExpansion 

<a name="VL"></a>

# 视觉和语言(Vision-Language)

**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**

- Paper: https://arxiv.org/abs/2403.02781
- Code: https://github.com/zhengli97/PromptKD

**FairCLIP: Harnessing Fairness in Vision-Language Learning**

- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP

<a name="Object-Detection"></a>

# 目标检测(Object Detection)

**DETRs Beat YOLOs on Real-time Object Detection**

- Paper: https://arxiv.org/abs/2304.08069
- Code: https://github.com/lyuwenyu/RT-DETR

**Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation**

- Paper: https://arxiv.org/abs/2312.01220
- Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation 

**YOLO-World: Real-Time Open-Vocabulary Object Detection**

- Paper: https://arxiv.org/abs/2401.17270
- Code: https://github.com/AILab-CVC/YOLO-World

**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**

- Paper: https://arxiv.org/abs/2403.16131
- Code: https://github.com/xiuqhou/Salience-DETR

<a name="Anomaly-Detection"></a>

# 异常检测(Anomaly Detection)

**Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection**

- Paper: https://arxiv.org/abs/2310.12790
- Code: https://github.com/mala-lab/AHL

<a name="VT"></a>

# 目标跟踪(Object Tracking)

**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**

- Paper: https://arxiv.org/abs/2403.04700
- Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT 

<a name="Semantic-Segmentation"></a>

# 语义分割(Semantic Segmentation)

**Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation**

- Paper: https://arxiv.org/abs/2312.04265
- Code: https://github.com/w1oves/Rein

**SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation**

- Paper: https://arxiv.org/abs/2311.15537
- Code: https://github.com/xb534/SED 

<a name="MI"></a>

# 医学图像(Medical Image)

**Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology**

- Paper: https://arxiv.org/abs/2402.17228
- Code: https://github.com/DearCaat/RRT-MIL

**VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis**

- Paper: https://arxiv.org/abs/2402.17300
- Code: https://github.com/Luffy03/VoCo

**ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images**

- Paper: https://arxiv.org/abs/2311.15264
- Code: https://github.com/nicoboou/chada_vit 

<a name="MIS"></a>

# 医学图像分割(Medical Image Segmentation)


<a name="Autonomous-Driving"></a>

# 自动驾驶(Autonomous Driving)

**UniPAD: A Universal Pre-training Paradigm for Autonomous Driving**

- Paper: https://arxiv.org/abs/2310.08370
- Code: https://github.com/Nightmare-n/UniPAD

**Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications**

- Paper: https://arxiv.org/abs/2311.17663
- Code: https://github.com/haomo-ai/Cam4DOcc

**Memory-based Adapters for Online 3D Scene Perception**

- Paper: https://arxiv.org/abs/2403.06974
- Code: https://github.com/xuxw98/Online3D

**Symphonize 3D Semantic Scene Completion with Contextual Instance Queries**

- Paper: https://arxiv.org/abs/2306.15670
- Code: https://github.com/hustvl/Symphonies

**A Real-world Large-scale Dataset for Roadside Cooperative Perception**

- Paper: https://arxiv.org/abs/2403.10145
- Code: https://github.com/AIR-THU/DAIR-RCooper

**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**

- Paper: https://arxiv.org/abs/2403.07535
- Code: https://github.com/Junda24/AFNet

**Traffic Scene Parsing through the TSP6K Dataset**

- Paper: https://arxiv.org/pdf/2303.02835.pdf
- Code: https://github.com/PengtaoJiang/TSP6K 

<a name="3D-Point-Cloud"></a>

# 3D点云(3D-Point-Cloud)


<a name="3DOD"></a>

# 3D目标检测(3D Object Detection)

**PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection**

- Paper: https://arxiv.org/abs/2312.08371
- Code: https://github.com/kuanchihhuang/PTT

**UniMODE: Unified Monocular 3D Object Detection**

- Paper: https://arxiv.org/abs/2402.18573

<a name="3DOD"></a>

# 3D语义分割(3D Semantic Segmentation)

<a name="Image-Editing"></a>

# 图像编辑(Image Editing)

**Edit One for All: Interactive Batch Image Editing**

- Homepage: https://thaoshibe.github.io/edit-one-for-all 
- Paper: https://arxiv.org/abs/2401.10219
- Code: https://github.com/thaoshibe/edit-one-for-all

<a name="Video-Editing"></a>

# 视频编辑(Video Editing)

**MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers**

- Homepage:  [https://maskint.github.io](https://maskint.github.io/) 

- Paper: https://arxiv.org/abs/2312.12468

<a name="LLV"></a>

# Low-level Vision

**Residual Denoising Diffusion Models**

- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM

**Boosting Image Restoration via Priors from Pre-trained Models**

- Paper: https://arxiv.org/abs/2403.06793

<a name="SR"></a>

# 超分辨率(Super-Resolution)

**SeD: Semantic-Aware Discriminator for Image Super-Resolution**

- Paper: https://arxiv.org/abs/2402.19387
- Code: https://github.com/lbc12345/SeD

**APISR: Anime Production Inspired Real-World Anime Super-Resolution**

- Paper: https://arxiv.org/abs/2403.01598
- Code: https://github.com/Kiteretsu77/APISR 

<a name="Denoising"></a>

# 去噪(Denoising)

## 图像去噪(Image Denoising)

<a name="3D-Human-Pose-Estimation"></a>

# 3D人体姿态估计(3D Human Pose Estimation)

**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**

- Paper: https://arxiv.org/abs/2311.12028
- Code: https://github.com/NationalGAILab/HoT 

<a name="Image-Generation"></a>

# 图像生成(Image Generation)

**InstanceDiffusion: Instance-level Control for Image Generation**

- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/

- Paper: https://arxiv.org/abs/2402.03290
- Code: https://github.com/frank-xwang/InstanceDiffusion

**ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations**

- Homepage: https://eclipse-t2i.vercel.app/
- Paper: https://arxiv.org/abs/2312.04655

- Code: https://github.com/eclipse-t2i/eclipse-inference

**Instruct-Imagen: Image Generation with Multi-modal Instruction**

- Paper: https://arxiv.org/abs/2401.01952

**Residual Denoising Diffusion Models**

- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM

**UniGS: Unified Representation for Image Generation and Segmentation**

- Paper: https://arxiv.org/abs/2312.01985

**Multi-Instance Generation Controller for Text-to-Image Synthesis**

- Paper: https://arxiv.org/abs/2402.05408
- Code: https://github.com/limuloo/migc

**SVGDreamer: Text Guided SVG Generation with Diffusion Model**

- Paper: https://arxiv.org/abs/2312.16476
- Code: https://ximinng.github.io/SVGDreamer-project/

**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**

- Paper: https://arxiv.org/abs/2312.05849
- Code: https://github.com/jiuntian/interactdiffusion

**Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following**

- Paper: https://arxiv.org/abs/2311.17002
- Code: https://github.com/ali-vilab/Ranni

<a name="Video-Generation"></a>

# 视频生成(Video Generation)

**Vlogger: Make Your Dream A Vlog**

- Paper: https://arxiv.org/abs/2401.09414
- Code: https://github.com/Vchitect/Vlogger

**VBench: Comprehensive Benchmark Suite for Video Generative Models**

- Homepage: https://vchitect.github.io/VBench-project/ 
- Paper: https://arxiv.org/abs/2311.17982
- Code: https://github.com/Vchitect/VBench

**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**

- Homeoage: https://video-motion-customization.github.io/ 
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization

<a name="3D-Generation"></a>

# 3D生成

**CityDreamer: Compositional Generative Model of Unbounded 3D Cities**

- Homepage: https://haozhexie.com/project/city-dreamer/ 
- Paper: https://arxiv.org/abs/2309.00610
- Code: https://github.com/hzxie/city-dreamer

**LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching**

- Paper: https://arxiv.org/abs/2311.11284
- Code: https://github.com/EnVision-Research/LucidDreamer 

<a name="Video-Understanding"></a>

# 视频理解(Video Understanding)

**MVBench: A Comprehensive Multi-modal Video Understanding Benchmark**

- Paper: https://arxiv.org/abs/2311.17005
- Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 

<a name="KD"></a>

# 知识蒸馏(Knowledge Distillation)

**Logit Standardization in Knowledge Distillation**

- Paper: https://arxiv.org/abs/2403.01427
- Code: https://github.com/sunshangquan/logit-standardization-KD

**Efficient Dataset Distillation via Minimax Diffusion**

- Paper: https://arxiv.org/abs/2311.15529
- Code: https://github.com/vimar-gu/MinimaxDiffusion

<a name="Stereo-Matching"></a>

# 立体匹配(Stereo Matching)

**Neural Markov Random Field for Stereo Matching**

- Paper: https://arxiv.org/abs/2403.11193
- Code: https://github.com/aeolusguan/NMRF 

<a name="SGG"></a>

# 场景图生成(Scene Graph Generation)

**HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation**

- Homepage: https://zhangce01.github.io/HiKER-SGG/ 
- Paper : https://arxiv.org/abs/2403.12033
- Code: https://github.com/zhangce01/HiKER-SGG

<a name="Video-Quality-Assessment"></a>

# 视频质量评价(Video Quality Assessment)

**KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos**

- Homepage: https://lixinustc.github.io/projects/KVQ/ 

- Paper: https://arxiv.org/abs/2402.07220
- Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024

<a name="Datasets"></a>

# 数据集(Datasets)

**A Real-world Large-scale Dataset for Roadside Cooperative Perception**

- Paper: https://arxiv.org/abs/2403.10145
- Code: https://github.com/AIR-THU/DAIR-RCooper

**Traffic Scene Parsing through the TSP6K Dataset**

- Paper: https://arxiv.org/pdf/2303.02835.pdf
- Code: https://github.com/PengtaoJiang/TSP6K 

<a name="Others"></a>

# 其他(Others)

**Object Recognition as Next Token Prediction**

- Paper: https://arxiv.org/abs/2312.02142
- Code: https://github.com/kaiyuyue/nxtp

**ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks**

- Paper: https://arxiv.org/abs/2306.14525
- Code: https://parameternet.github.io/ 

**Seamless Human Motion Composition with Blended Positional Encodings**

- Paper: https://arxiv.org/abs/2402.15509
- Code: https://github.com/BarqueroGerman/FlowMDM 

**LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning**

- Homepage:  https://ll3da.github.io/ 

- Paper: https://arxiv.org/abs/2311.18651
- Code: https://github.com/Open3DA/LL3DA

 **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update**

- Homepage: https://clova-tool.github.io/ 
- Paper: https://arxiv.org/abs/2312.10908

**MoMask: Generative Masked Modeling of 3D Human Motions**

- Paper: https://arxiv.org/abs/2312.00063
- Code: https://github.com/EricGuo5513/momask-codes

 **Amodal Ground Truth and Completion in the Wild**

- Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/ 
- Paper: https://arxiv.org/abs/2312.17247
- Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild

**Improved Visual Grounding through Self-Consistent Explanations**

- Paper: https://arxiv.org/abs/2312.04554
- Code: https://github.com/uvavision/SelfEQ

**ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object**

- Homepage: https://chenshuang-zhang.github.io/imagenet_d/
- Paper: https://arxiv.org/abs/2403.18775
- Code: https://github.com/chenshuang-zhang/imagenet_d

**Learning from Synthetic Human Group Activities**

- Homepage: https://cjerry1243.github.io/M3Act/ 
- Paper  https://arxiv.org/abs/2306.16772
- Code: https://github.com/cjerry1243/M3Act

**A Cross-Subject Brain Decoding Framework**

- Homepage: https://littlepure2333.github.io/MindBridge/
- Paper: https://arxiv.org/abs/2404.07850
- Code: https://github.com/littlepure2333/MindBridge

**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**

- Paper : https://arxiv.org/abs/2403.17749
- Code: https://github.com/YuqiYang213/MLoRE

**Contrastive Mean-Shift Learning for Generalized Category Discovery**

- Homepage: https://postech-cvlab.github.io/cms/ 
- Paper: https://arxiv.org/abs/2404.09451
- Code: https://github.com/sua-choi/CMS
  

================================================
FILE: CVPR2025-Papers-with-Code.md
================================================
# CVPR 2025 论文和开源项目合集(Papers with Code)

CVPR 2025 decisions are now available on OpenReview！22.1% = 2878 / 13008


> 注1：欢迎各位大佬提交issue，分享CVPR 2025论文和开源项目！
>
> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
>
> - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code)
> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)
> - [CVPR 2024](CVPR2024-Papers-with-Code.md)

欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2025等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！

![](CVer学术交流群.png)

# 【CVPR 2025 论文开源目录】

- [3DGS(Gaussian Splatting)](#3DGS)
- [Agent)](#Agent)
- [Avatars](#Avatars)
- [Backbone](#Backbone)
- [CLIP](#CLIP)EVOS
- [Mamba](#Mamba)
- [Embodied AI](#Embodied-AI)
- [GAN](#GAN)
- [GNN](#GNN)
- [多模态大语言模型(MLLM)](#MLLM)
- [大语言模型(LLM)](#LLM)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [扩散模型(Diffusion Models)](#Diffusion)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像(Medical Image)](#MI)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [3D Visual Grounding(3D视觉定位)](#3DVG)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [3D生成(3D Generation)](#3D-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [具身智能(Embodied AI)](#Embodied)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [暗光图像增强(Low-light Image Enhancement)](#Low-light)
- [场景图生成(Scene Graph Generation)](#SGG)
- [风格迁移(Style Transfer)](#ST)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
- [压缩感知(Compressive Sensing)](#CS)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)

<a name="3DGS"></a>

# 3DGS(Gaussian Splatting)


<a name="Agent"></a>

# Agent

**SpiritSight Agent: Advanced GUI Agent with One Look**

- Paper: https://arxiv.org/abs/2503.03196
- Code: https://hzhiyuan.github.io/SpiritSight-Agent


<a name="Avatars"></a>

# Avatars


# Backbone

**Building Vision Models upon Heat Conduction**

- Paper: https://arxiv.org/abs/2405.16555
- Code: https://github.com/MzeroMiko/vHeat

**LSNet: See Large, Focus Small**

- Paper: https://arxiv.org/abs/2503.23135
- Code: https://github.com/jameslahm/lsnet


<a name="CLIP"></a>

# CLIP


<a name="Mamba"></a>

# Mamba


**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**

- Paper: https://arxiv.org/abs/2407.08083
- Code: https://github.com/NVlabs/MambaVision

**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**

- Paper: https://arxiv.org/abs/2411.15941
- Code: https://github.com/lewandofskee/MobileMamba

**MambaIC: State Space Models for High-Performance Learned Image Compression**

- Paper: https://arxiv.org/abs/2503.12461
- Code: https://arxiv.org/abs/2503.12461

<a name="Embodied-AI"></a>

# Embodied AI

**CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos**

- Project: https://ai4ce.github.io/CityWalker/
- Paper: https://arxiv.org/abs/2411.17820
- Code: https://github.com/ai4ce/CityWalker


<a name="GAN"></a>

# GAN

<a name="OCR"></a>

# OCR


<a name="NeRF"></a>

# NeRF


<a name="DETR"></a>

# DETR

**Mr. DETR: Instructive Multi-Route Training for Detection Transformers**

- Paper: https://arxiv.org/abs/2412.10028
- Code: https://github.com/Visual-AI/Mr.DETR


<a name="Prompt"></a>

# Prompt

<a name="MLLM"></a>

# 多模态大语言模型(MLLM)

**LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences**

- Paper： https://arxiv.org/abs/2412.01292
- Code: https://github.com/Hoyyyaard/LSceneLLM


**DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution**

- Paper: https://arxiv.org/abs/2405.16071
- Code: https://github.com/callsys/DynRefer


**Retrieval-Augmented Personalization for Multimodal Large Language Models**

- Project Page: https://hoar012.github.io/RAP-Project/
- Paper: https://arxiv.org/abs/2410.13360
- Code: https://github.com/Hoar012/RAP-MLLM

**BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models**

- Paper: https://arxiv.org/abs/2411.15232
- Code: https://github.com/HealthX-Lab/BiomedCoOp

**FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression**

- Paper: https://arxiv.org/abs/2412.04317
- Code: https://github.com/codefanw/FlashSloth

**MMRL: Multi-Modal Representation Learning for Vision-Language Models**

- Paper: https://arxiv.org/abs/2503.08497
- Code: https://github.com/yunncheng/MMRL

**PAVE: Patching and Adapting Video Large Language Models**

- Paper: https://arxiv.org/abs/2503.19794
- Code: https://github.com/dragonlzm/PAVE

**AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization**

- Paper: https://arxiv.org/abs/2503.23733
- Code: https://github.com/THUNLP-MT/AdaMMS


<a name="LLM"></a>

# 大语言模型(LLM)


<a name="NAS"></a>

# NAS

<a name="ReID"></a>

# ReID(重识别)

**From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization**

- Paper: https://arxiv.org/abs/2503.00938
- Code: https://github.com/yuanc3/Pose2ID


**AirRoom: Objects Matter in Room Reidentification**

- Project: https://sairlab.org/airroom/
- Paper: https://arxiv.org/abs/2503.01130


**IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification**

- Paper: https://arxiv.org/abs/2503.10324
- Code: https://github.com/924973292/IDEA


<a name="Diffusion"></a>

# 扩散模型(Diffusion Models)

**TinyFusion: Diffusion Transformers Learned Shallow**

- Paper: https://arxiv.org/abs/2412.01199
- Code: https://github.com/VainF/TinyFusion

**DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture**

- Paper: https://arxiv.org/abs/2409.03550
- Code: https://github.com/qianlong0502/DKDM

**Tiled Diffusion**

- Homepage: https://madaror.github.io/tiled-diffusion.github.io/
- Paper: https://arxiv.org/abs/2412.15185
- Code: https://github.com/madaror/tiled-diffusion


<a name="Vision-Transformer"></a>

# Vision Transformer


<a name="VL"></a>

# 视觉和语言(Vision-Language)

**NLPrompt: Noise-Label Prompt Learning for Vision-Language Models**

- Paper: https://arxiv.org/abs/2412.01256
- Code: https://github.com/qunovo/NLPrompt

**PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability**

- Paper: https://arxiv.org/abs/2503.08481
- Code: https://github.com/unira-zwj/PhysVLM

**MMRL: Multi-Modal Representation Learning for Vision-Language Models**

- Paper: https://arxiv.org/abs/2503.08497
- Code: https://github.com/yunncheng/MMRL


<a name="Object-Detection"></a>

# 目标检测(Object Detection)


**LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models**

- Paper: https://arxiv.org/abs/2501.18954
- Code：https://github.com/iSEE-Laboratory/LLMDet

**Mr. DETR: Instructive Multi-Route Training for Detection Transformers**

- Paper: https://arxiv.org/abs/2412.10028
- Code: https://github.com/Visual-AI/Mr.DETR


<a name="Anomaly-Detection"></a>

# 异常检测(Anomaly Detection)


<a name="VT"></a>

# 目标跟踪(Object Tracking)

**Multiple Object Tracking as ID Prediction**

- Paper：https://arxiv.org/abs/2403.16848
- Code: https://github.com/MCG-NJU/MOTIP

**Omnidirectional Multi-Object Tracking**

- Paper:https://arxiv.org/abs/2503.04565
- Code:https://github.com/xifen523/OmniTrack


<a name="MI"></a>

# 医学图像(Medical Image)


**BrainMVP: Multi-modal Vision Pre-training for Medical Image Analysis**

- Paper: https://arxiv.org/abs/2410.10604
- Code: https://github.com/shaohao011/BrainMVP


# 医学图像分割(Medical Image Segmentation)

**Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation**

- Paper: https://arxiv.org/abs/2503.13012
- Code: https://github.com/Yore0/TTDG-MGM


<a name="Autonomous-Driving"></a>

# 自动驾驶(Autonomous Driving)

**LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes**

- Project: https://ldkong.com/LiMoE
- Paper: https://arxiv.org/abs/2501.04004
- Code: https://github.com/Xiangxu-0103/LiMoE


# 3D点云(3D-Point-Cloud)

**Unlocking Generalization Power in LiDAR Point Cloud Registration**

- Paper: https://arxiv.org/abs/2503.10149
- Code: https://github.com/peakpang/UGP


<a name="3DOD"></a>

# 3D目标检测(3D Object Detection)


<a name="3DOD"></a>

# 3D语义分割(3D Semantic Segmentation)


<a name="LLV"></a>

# Low-level Vision


<a name="SR"></a>

# 超分辨率(Super-Resolution)

**AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution**

- Paper: https://arxiv.org/abs/2412.00124
- Code: https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution


<a name="Denoising"></a>

# 去噪(Denoising)

## 图像去噪(Image Denoising)

<a name="3D-Human-Pose-Estimation"></a>

# 3D人体姿态估计(3D Human Pose Estimation)

**Reconstructing Humans with a Biomechanically Accurate Skeleton**

- Homepage: https://isshikihugh.github.io/HSMR/
- Code: https://github.com/IsshikiHugh/HSMR

<a name="3DVG"></a>

#3D Visual Grounding(3D视觉定位)

**ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding**

- Homepage: https://pqh22.github.io/projects/ProxyTransformation/index.html

- Code: https://github.com/pqh22/ProxyTransformation

- Paper: https://arxiv.org/abs/2502.19247


<a name="Image-Generation"></a>

# 图像生成(Image Generation)

**Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models**

- Paper: https://arxiv.org/abs/2501.01423
- Code: https://github.com/hustvl/LightningDiT

**SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models**

- Paper: https://arxiv.org/abs/2412.04852
- Code: https://github.com/taco-group/SleeperMark


**TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation**

- Homepage: https://byteflow-ai.github.io/TokenFlow/
- Code: https://github.com/ByteFlow-AI/TokenFlow
- Paper:https://arxiv.org/abs/2412.03069

**PAR: Parallelized Autoregressive Visual Generation**

- Project: https://epiphqny.github.io/PAR-project/
- Paper: https://arxiv.org/abs/2412.15119
- Code: https://github.com/Epiphqny/PAR


**Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis**

- Project: https://generative-photography.github.io/project/
- Paper: https://arxiv.org/abs/2412.02168
- Code: https://github.com/pandayuanyu/generative-photography


**OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation**

- Project Page: https://opening-benchmark.github.io/
- Paper: https://arxiv.org/abs/2411.18499).
- Code: https://github.com/LanceZPF/OpenING


<a name="Video-Generation"></a>

# 视频生成(Video Generation)

**Identity-Preserving Text-to-Video Generation by Frequency Decomposition**

- Paper: https://arxiv.org/abs/2411.17440
- Code: https://github.com/PKU-YuanGroup/ConsisID


**Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models**

- Paper: https://arxiv.org/abs/2407.15642
- Code: https://github.com/maxin-cn/Cinemo

**X-Dyna: Expressive Dynamic Human Image Animation**

- Paper: https://arxiv.org/abs/2501.10021
- Code: https://github.com/bytedance/X-Dyna

**PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation**

- Paper: https://arxiv.org/pdf/2412.00596
- Code: https://github.com/pittisl/PhyT2V


**Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model**

- Project: https://liewfeng.github.io/TeaCache/
- Paper: https://arxiv.org/abs/2411.19108
- Code: https://github.com/ali-vilab/TeaCache


**AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion**

- Project: https://iva-mzsun.github.io/AR-Diffusion
- Paper: https://arxiv.org/abs/2503.07418
- Code: https://github.com/iva-mzsun/AR-Diffusion


<a name="Image-Editing"></a>

# 图像编辑(Image Editing)

**Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing**

- Paper: https://arxiv.org/abs/2411.16832
- Code: https://github.com/taco-group/FaceLock


**h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform**

- Paper: https://arxiv.org/abs/2503.02187
- Code: https://github.com/nktoan/h-edit


<a name="Video-Editing"></a>

# 视频编辑(Video Editing)


<a name="3D-Generation"></a>

# 3D生成(3D Generation)


**Generative Gaussian Splatting for Unbounded 3D City Generation**

- Project: https://haozhexie.com/project/gaussian-city
- Paper: https://arxiv.org/abs/2406.06526
- Code: https://github.com/hzxie/GaussianCity

**StdGEN: Semantic-Decomposed 3D Character Generation from Single Images**

- Project: https://stdgen.github.io/
- Paper: https://arxiv.org/abs/2411.05738
- Code: https://github.com/hyz317/StdGEN


<a name="3D-Reconstruction"></a>

# 3D重建(3D Reconstruction)

**Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass**

- Project: https://fast3r-3d.github.io/
- Paper: https://arxiv.org/abs/2501.13928


<a name="HMG"></a>

# 人体运动生成(Human Motion Generation)

**SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance**

- Project: https://4dvlab.github.io/project_page/semgeomo/
- Paper: https://arxiv.org/abs/2503.01291
- https://github.com/4DVLab/SemGeoMo

<a name="Video-Understanding"></a>

# 视频理解(Video Understanding)

**Temporal Grounding Videos like Flipping Manga**

- Paper: https://arxiv.org/abs/2411.10332
- Code: https://github.com/yongliang-wu/NumPro

<a name="Embodied"></a>

# 具身智能(Embodied AI)

**Universal Actions for Enhanced Embodied Foundation Models**

- Project: https://2toinf.github.io/UniAct/
- Paper: https://arxiv.org/abs/2501.10105
- Code: https://github.com/2toinf/UniAct

**PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability**

- Paper: https://arxiv.org/abs/2503.08481
- Code: https://github.com/unira-zwj/PhysVLM


<a name="KD"></a>

# 知识蒸馏(Knowledge Distillation)

<a name="Depth-Estimation"></a>


# 深度估计(Depth Estimation)

**DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos**

- Project: https://depthcrafter.github.io
- Paper: https://arxiv.org/abs/2409.02095
- Code: https://github.com/Tencent/DepthCrafter


**MonSter: Marry Monodepth to Stereo Unleashes Power**

- Paper: https://arxiv.org/abs/2501.08643
- Code: https://github.com/Junda24/MonSter

**DEFOM-Stereo: Depth Foundation Model Based Stereo Matching**

- Project: https://insta360-research-team.github.io/DEFOM-Stereo/
- Paper: https://arxiv.org/abs/2501.09466
- Code: https://github.com/Insta360-Research-Team/DEFOM-Stereo


<a name="Stereo-Matching"></a>

# 立体匹配(Stereo Matching)

**MonSter: Marry Monodepth to Stereo Unleashes Power**

- Paper: https://arxiv.org/abs/2501.08643
- Code: https://github.com/Junda24/MonSter


<a name="Low-light"></a>

# 暗光图像增强(Low-light Image Enhancement)


**HVI: A New color space for Low-light Image Enhancement**

- Paper: https://arxiv.org/abs/2502.20272
- Code: https://github.com/Fediory/HVI-CIDNet
- Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_

**ReDDiT: Efficient Diffusion as Low Light Enhancer**

- Paper: https://arxiv.org/abs/2410.12346
- Code: https://github.com/lgz-0713/ReDDiT


<a name="IC"></a>

# 图像压缩(Image Compression)](#IC)

**MambaIC: State Space Models for High-Performance Learned Image Compression**

- Paper: https://arxiv.org/abs/2503.12461
- Code: https://arxiv.org/abs/2503.12461


<a name="SGG"></a>

# 场景图生成(Scene Graph Generation)


<a name="ST"></a>

# 风格迁移(Style Transfer)

**StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements**

- Project: https://stylestudio-official.github.io/
- Paper: https://arxiv.org/abs/2412.08503
- Code: https://github.com/Westlake-AGI-Lab/StyleStudio


<a name="IQA"></a>

# 图像质量评价(Image Quality Assessment)

**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**

- Homepage: https://yichengchen24.github.io/projects/autocherrypicker
- Paper: https://arxiv.org/pdf/2406.20085
- Code: https://github.com/yichengchen24/ACP

<a name="Video-Quality-Assessment"></a>

# 视频质量评价(Video Quality Assessment)

<a name="CS"></a>

# 压缩感知(Compressive Sensing)

**Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing**

- Paper: https://arxiv.org/abs/2503.08429
- Code: https://github.com/FengodChen/DMP-DUN-CVPR2025


<a name="Datasets"></a>

# 数据集(Datasets)


**Objaverse++: Curated 3D Object Dataset with Quality Annotations**

- Paper: https://arxiv.org/abs/2504.07334
- Code: https://github.com/TCXX/ObjaversePlusPlus


<a name="Others"></a>

# 其他(Others)


**DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry**

- Paper: https://arxiv.org/abs/2503.13110
- Code: https://github.com/jinli99/DTGBrepGen


**Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation**

- Paper: https://arxiv.org/abs/2503.19307
- Code: https://github.com/delaprada/HandSynthesis.git

**EVOS: Efficient Implicit Neural Training via EVOlutionary Selector**

- Homepage: https://weixiang-zhang.github.io/proj-evos/
- Paper: https://arxiv.org/abs/2412.10153
- Code: https://github.com/zwx-open/EVOS-INR
  

================================================
FILE: README.md
================================================
# CVPR 2026 论文和开源项目合集(Papers with Code)

CVPR 2026 decisions are now available on OpenReview！25.42% = 4090 / 16092


> 注1：欢迎各位大佬提交issue，分享CVPR 2026论文和开源项目！
>
> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
>
> - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code)
> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)


欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2026等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！

![](CVer学术交流群.png)

# 【CVPR 2026 论文开源目录】

- [3DGS(Gaussian Splatting)](#3DGS)
- [Agent)](#Agent)
- [Avatars](#Avatars)
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [Mamba](#Mamba)
- [Embodied AI](#Embodied-AI)
- [GAN](#GAN)
- [GNN](#GNN)
- [多模态大语言模型(MLLM)](#MLLM)
- [大语言模型(LLM)](#LLM)
- [具身智能(Embodied AI)](#Embodied)
- [空间智能(Spatial Intelligence](#SI)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [扩散模型(Diffusion Models)](#Diffusion)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像(Medical Image)](#MI)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [3D Visual Grounding(3D视觉定位)](#3DVG)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [3D生成(3D Generation)](#3D-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [遥感(Remote)](#Remote)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [视频压缩(Video Compression)](#VC)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [暗光图像增强(Low-light Image Enhancement)](#Low-light)
- [场景图生成(Scene Graph Generation)](#SGG)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [风格迁移(Style Transfer)](#ST)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
- [压缩感知(Compressive Sensing)](#CS)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)

<a name="3DGS"></a>

# 3DGS(Gaussian Splatting)

**Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting**

- Paper: https://arxiv.org/abs/2602.20933
- Code: 
- Project: https://sk-fun.fun/DropAnSH-GS

**Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking**

- Paper: https://arxiv.org/abs/2512.01329
- Project: https://haza628.github.io/tagSplat/

**FastGS: Training 3D Gaussian Splatting in 100 Seconds**

- Paper: https://arxiv.org/pdf/2511.04283
- Code: https://github.com/fastgs/FastGS
- Project: https://fastgs.github.io/


<a name="Agent"></a>

# Agent


<a name="Avatars"></a>

# Avatars


# Backbone


<a name="CLIP"></a>

# CLIP


<a name="Mamba"></a>

# Mamba


<a name="GAN"></a>

# GAN

<a name="OCR"></a>

# OCR


<a name="NeRF"></a>

# NeRF


<a name="DETR"></a>

# DETR


<a name="Prompt"></a>

# Prompt

<a name="MLLM"></a>

# 多模态大语言模型(MLLM)

**Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking**

- Paper: https://arxiv.org/abs/2602.20330
- Code: https://github.com/UIUC-MONET/vlm-circuit-tracing

**UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark**

- Paper: https://arxiv.org/abs/2603.05075
- Code: 
- Project: https://any2any-mllm.github.io/unim/


<a name="LLM"></a>

# 大语言模型(LLM)


<a name="Embodied-AI"></a>


# 具身智能(Embodied AI)

**Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI**

- Paper: https://arxiv.org/abs/2511.20620
- Code: https://github.com/ai4ce/wanderland
- Project: https://ai4ce.github.io/wanderland/


<a name="SI"></a>


# 空间智能(Spatial Intelligence)

**Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning**

- Paper: https://arxiv.org/abs/2510.27606
- Code: https://github.com/InternLM/Spatial-SSRL
- Model: https://huggingface.co/internlm/Spatial-SSRL-7B


<a name="NAS"></a>

# NAS

<a name="ReID"></a>

# ReID(重识别)


**MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification**

- Paper: https://arxiv.org/abs/2512.03404
- Code: https://github.com/yjzhao1019/MOS


<a name="Diffusion"></a>

# 扩散模型(Diffusion Models)


<a name="Vision-Transformer"></a>

# Vision Transformer


<a name="VL"></a>

# 视觉和语言(Vision-Language)

**StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues**

- Paper: https://arxiv.org/abs/2602.20089
- Code: https://github.com/intelligolabs/StructXLIP

**ApET: Approximation-Error Guided Token Compression for Efficient VLMs**

- Paper: https://arxiv.org/abs/2602.19870
- Code: https://github.com/MaQianKun0/ApET

**Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking**

- Paper: https://arxiv.org/abs/2602.20330
- Code: https://github.com/UIUC-MONET/vlm-circuit-tracing


<a name="Object-Detection"></a>

# 目标检测(Object Detection)


<a name="Anomaly-Detection"></a>

# 异常检测(Anomaly Detection)


<a name="VT"></a>

# 目标跟踪(Object Tracking)


<a name="MI"></a>

# 医学图像(Medical Image)


# 医学图像分割(Medical Image Segmentation)

**MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation**

- Paper: https://arxiv.org/abs/2602.20423
- Code: https://github.com/HealthX-Lab/MedCLIPSeg
- Project: https://tahakoleilat.github.io/MedCLIPSeg

<a name="Autonomous-Driving"></a>

# 自动驾驶(Autonomous Driving)

**Open-Vocabulary Domain Generalization in Urban-Scene Segmentation**

- Paper: https://arxiv.org/pdf/2602.18853
- Code: https://github.com/DZhaoXd/s2_corr

**U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences**

- Paper: https://arxiv.org/abs/2512.02982
- Code: https://github.com/worldbench/U4D


# 3D点云(3D-Point-Cloud)

**CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation**

- Paper: https://arxiv.org/abs/2602.20409
- Code: https://github.com/SarthakM320/CLIPoint3D


<a name="3DOD"></a>

# 3D目标检测(3D Object Detection)


<a name="3DOD"></a>

# 3D语义分割(3D Semantic Segmentation)


<a name="LLV"></a>

# Low-level Vision


<a name="SR"></a>

# 超分辨率(Super-Resolution)


<a name="Denoising"></a>

# 去噪(Denoising)

## 图像去噪(Image Denoising)

<a name="3D-Human-Pose-Estimation"></a>

# 3D人体姿态估计(3D Human Pose Estimation)


<a name="3DVG"></a>

#3D Visual Grounding(3D视觉定位)


<a name="Image-Generation"></a>

# 图像生成(Image Generation)


ExpPortrait: Expressive Portrait Generation via Personalized Representation

- Paper: https://arxiv.org/abs/2602.19900
- Code: 


<a name="Video-Generation"></a>

# 视频生成(Video Generation)


<a name="Image-Editing"></a>

# 图像编辑(Image Editing)


<a name="Video-Editing"></a>

# 视频编辑(Video Editing)


<a name="3D-Generation"></a>

# 3D生成(3D Generation)


<a name="3D-Reconstruction"></a>

# 3D重建(3D Reconstruction)

**tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction**

- Project: https://cwchenwang.github.io/tttLRM/
- Paper: https://arxiv.org/abs/2602.20160
- Code: https://github.com/cwchenwang/tttLRM

**Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning**

- Project: https://flow3r-project.github.io/
- Paper: https://arxiv.org/abs/2602.20157
- Code: https://github.com/Kidrauh/flow3r

**RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing**

- Paper: https://arxiv.org/abs/2602.19753
- Code: https://github.com/yyyykf/RAP


<a name="HMG"></a>

# 人体运动生成(Human Motion Generation)

<a name="Video-Understanding"></a>

# 视频理解(Video Understanding)


<a name="Remote"></a>

# 遥感(Remote)

Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation

- Paper: https://arxiv.org/abs/2602.19863
- Code: None


<a name="KD"></a>

# 知识蒸馏(Knowledge Distillation)

<a name="Depth-Estimation"></a>


# 深度估计(Depth Estimation)


<a name="Stereo-Matching"></a>

# 立体匹配(Stereo Matching)


<a name="Low-light"></a>

# 暗光图像增强(Low-light Image Enhancement)


<a name="IC"></a>

# 图像压缩(Image Compression)](#IC)


<a name="VC"></a>

# 视频压缩(Video Compression)](#VC)

**UniComp: Rethinking Video Compression Through Informational Uniqueness**

- Paper: https://arxiv.org/abs/2512.03575
- Code: https://github.com/TimeMarker-LLM/UniComp


<a name="SGG"></a>

# 场景图生成(Scene Graph Generation)


<a name="Image-Retrieval"></a>

# 图像检索(Image Retrieval)

**PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing
**

- Paper: https://arxiv.org/abs/2603.04598
- Code: 


<a name="ST"></a>

# 风格迁移(Style Transfer)


<a name="IQA"></a>

# 图像质量评价(Image Quality Assessment)


<a name="Video-Quality-Assessment"></a>

# 视频质量评价(Video Quality Assessment)

<a name="CS"></a>

# 压缩感知(Compressive Sensing)


<a name="Datasets"></a>

# 数据集(Datasets)


<a name="Others"></a>

# 其他(Others)

**Decoupling Defense Strategies for Robust Image Watermarking**

- Paper: https://arxiv.org/abs/2602.20053
- Code: None

**Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery**

- Paper: https://arxiv.org/abs/2602.19910
- Code: 

**The Invisible Gorilla Effect in Out-of-distribution Detection**

- Paper: https://arxiv.org/abs/2602.20068
- Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect

**SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images**

- Paper: https://arxiv.org/abs/2602.20412
- Code: 

**RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces**

- Paper: https://arxiv.org/abs/2602.20618
- Code: 

**Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models**

- Paper:
- Code: 

**GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement**

- Paper: https://arxiv.org/abs/2603.05095
- Code: 


**FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation**

- Paper: https://arxiv.org/abs/2603.04733
- Code: https://github.com/eVI-group-SCU/FOZO

**Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning
**

- Paper: https://arxiv.org/abs/2603.04825
- Code: https://github.com/RyanZhaoIc/CAD

  
================================================
FILE: master
================================================