Repository: amusi/CVPR2026-Papers-with-Code
Branch: main
Commit: 5709455e269a
Files: 9
Total size: 316.0 KB
Directory structure:
gitextract_f2cckni0/
├── CVPR2019-Papers-with-Code.md
├── CVPR2020-Papers-with-Code.md
├── CVPR2021-Papers-with-Code.md
├── CVPR2022-Papers-with-Code.md
├── CVPR2023-Papers-with-Code.md
├── CVPR2024-Papers-with-Code.md
├── CVPR2025-Papers-with-Code.md
├── README.md
└── master
================================================
FILE CONTENTS
================================================
================================================
FILE: CVPR2019-Papers-with-Code.md
================================================
# CVPR2019-Code
CVPR 2019 论文开源项目合集
传送门:[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)
附:[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv)
- [目标检测](#Object-Detection)
- [目标跟踪](#Object-Tracking)
- [语义分割](#Semantic-Segmentation)
- [实例分割](#Instance-Segmentation)
- [GAN](#GAN)
- [人脸检测](#Face-Detection)
- [人体姿态估计](#Human-Pose-Estimation)
- [6DoF 姿态估计](#6DoF-Pose-Estimation)
- [头部姿态估计](#Head-Pose-Estimation)
- [人群密度估计](#Crowd-Counting)
**更新记录:**
- 20200226:添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)
- 20191026:添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv)
- 20190405:添加 8 篇论文(目标检测、语义分割等方向)
- 20190408:添加 6 篇论文(目标跟踪、GAN、6DoF姿态估计等方向)
# 目标检测
**Bounding Box Regression with Uncertainty for Accurate Object Detection**
- arXiv:
- github:
# 目标跟踪
**Fast Online Object Tracking and Segmentation: A Unifying Approach**
- arXiv:
- github:
- homepage:
**Unsupervised Deep Tracking**
- arXiv:
- github:
- github(PyTorch):
**Target-Aware Deep Tracking**
- arXiv:
- homepage:
# 语义分割
**Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation**
- arXiv:
- github:[https://github.com/LinZhuoChen/DUpsampling(非官方)](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89)
**Dual Attention Network for Scene Segmentation**
- arXiv:
- github:
**Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images**
- arXiv:None
- github:
# 实例分割
**Mask Scoring R-CNN**
- arXiv:
- github:
# GAN
**Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis**
- arXiv:
- github:
# 人脸检测
**DSFD: Dual Shot Face Detector**
- arXiv:
- github:
# 人体姿态估计
**Deep High-Resolution Representation Learning for Human Pose Estimation**
- arXiv:
- github:
# 6DoF姿态估计
**PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**
- arXiv:
- github:
# 头部姿态估计
**PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**
- paper:
- github:
# 人群密度估计
**Learning from Synthetic Data for Crowd Counting in the Wild**
- arXiv:
- github:
- homepage:
================================================
FILE: CVPR2020-Papers-with-Code.md
================================================
# CVPR2020-Code
[CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目
**【推荐阅读】**
- [CVPR 2020 virtual](http://cvpr20.com/)
- ECCV 2020 论文开源项目合集来了:https://github.com/amusi/ECCV2020-Code
- 关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
**【CVPR 2020 论文开源目录】**
- [CNN](#CNN)
- [图像分类](#Image-Classification)
- [视频分类](#Video-Classification)
- [目标检测](#Object-Detection)
- [3D目标检测](#3D-Object-Detection)
- [视频目标检测](#Video-Object-Detection)
- [目标跟踪](#Object-Tracking)
- [语义分割](#Semantic-Segmentation)
- [实例分割](#Instance-Segmentation)
- [全景分割](#Panoptic-Segmentation)
- [视频目标分割](#VOS)
- [超像素分割](#Superpixel)
- [交互式图像分割](#IIS)
- [NAS](#NAS)
- [GAN](#GAN)
- [Re-ID](#Re-ID)
- [3D点云(分类/分割/配准/跟踪等)](#3D-PointCloud)
- [人脸(识别/检测/重建等)](#Face)
- [人体姿态估计(2D/3D)](#Human-Pose-Estimation)
- [人体解析](#Human-Parsing)
- [场景文本检测](#Scene-Text-Detection)
- [场景文本识别](#Scene-Text-Recognition)
- [特征(点)检测和描述](#Feature)
- [超分辨率](#Super-Resolution)
- [模型压缩/剪枝](#Model-Compression)
- [视频理解/行为识别](#Action-Recognition)
- [人群计数](#Crowd-Counting)
- [深度估计](#Depth-Estimation)
- [6D目标姿态估计](#6DOF)
- [手势估计](#Hand-Pose)
- [显著性检测](#Saliency)
- [去噪](#Denoising)
- [去雨](#Deraining)
- [去模糊](#Deblurring)
- [去雾](#Dehazing)
- [特征点检测与描述](#Feature)
- [视觉问答(VQA)](#VQA)
- [视频问答(VideoQA)](#VideoQA)
- [视觉语言导航](#VLN)
- [视频压缩](#Video-Compression)
- [视频插帧](#Video-Frame-Interpolation)
- [风格迁移](#Style-Transfer)
- [车道线检测](#Lane-Detection)
- ["人-物"交互(HOI)检测](#HOI)
- [轨迹预测](#TP)
- [运动预测](#Motion-Predication)
- [光流估计](#OF)
- [图像检索](#IR)
- [虚拟试衣](#Virtual-Try-On)
- [HDR](#HDR)
- [对抗样本](#AE)
- [三维重建](#3D-Reconstructing)
- [深度补全](#DC)
- [语义场景补全](#SSC)
- [图像/视频描述](#Captioning)
- [线框解析](#WP)
- [数据集](#Datasets)
- [其他](#Others)
- [不确定中没中](#Not-Sure)
# CNN
**Exploring Self-attention for Image Recognition**
- 论文:https://hszhao.github.io/papers/cvpr20_san.pdf
- 代码:https://github.com/hszhao/SAN
**Improving Convolutional Networks with Self-Calibrated Convolutions**
- 主页:https://mmcheng.net/scconv/
- 论文:http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
- 代码:https://github.com/backseason/SCNet
**Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets**
- 论文:https://arxiv.org/abs/2003.13549
- 代码:https://github.com/zeiss-microscopy/BSConv
# 图像分类
**Interpretable and Accurate Fine-grained Recognition via Region Grouping**
- 论文:https://arxiv.org/abs/2005.10411
- 代码:https://github.com/zxhuang1698/interpretability-by-parts
**Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion**
- 论文:https://arxiv.org/abs/2003.04490
- 代码:https://github.com/AdamKortylewski/CompositionalNets
**Spatially Attentive Output Layer for Image Classification**
- 论文:https://arxiv.org/abs/2004.07570
- 代码(好像被原作者删除了):https://github.com/ildoonet/spatially-attentive-output-layer
# 视频分类
**SmallBigNet: Integrating Core and Contextual Views for Video Classification**
- 论文:https://arxiv.org/abs/2006.14582
- 代码:https://github.com/xhl-video/SmallBigNet
# 目标检测
**Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf
- 代码:https://github.com/FishYuLi/BalancedGroupSoftmax
**AugFPN: Improving Multi-scale Feature Learning for Object Detection**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf
- 代码:https://github.com/Gus-Guo/AugFPN
**Noise-Aware Fully Webly Supervised Object Detection**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html
- 代码:https://github.com/shenyunhang/NA-fWebSOD/
**Learning a Unified Sample Weighting Network for Object Detection**
- 论文:https://arxiv.org/abs/2006.06568
- 代码:https://github.com/caiqi/sample-weighting-network
**D2Det: Towards High Quality Object Detection and Instance Segmentation**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
- 代码:https://github.com/JialeCao001/D2Det
**Dynamic Refinement Network for Oriented and Densely Packed Object Detection**
- 论文下载链接:https://arxiv.org/abs/2005.09973
- 代码和数据集:https://github.com/Anymake/DRN_CVPR2020
**Scale-Equalizing Pyramid Convolution for Object Detection**
论文:https://arxiv.org/abs/2005.03101
代码:https://github.com/jshilong/SEPC
**Revisiting the Sibling Head in Object Detector**
- 论文:https://arxiv.org/abs/2003.07540
- 代码:https://github.com/Sense-X/TSD
**Scale-equalizing Pyramid Convolution for Object Detection**
- 论文:暂无
- 代码:https://github.com/jshilong/SEPC
**Detection in Crowded Scenes: One Proposal, Multiple Predictions**
- 论文:https://arxiv.org/abs/2003.09163
- 代码:https://github.com/megvii-model/CrowdDetection
**Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection**
- 论文:https://arxiv.org/abs/2004.04725
- 代码:https://github.com/NVlabs/wetectron
**Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection**
- 论文:https://arxiv.org/abs/1912.02424
- 代码:https://github.com/sfzhang15/ATSS
**BiDet: An Efficient Binarized Object Detector**
- 论文:https://arxiv.org/abs/2003.03961
- 代码:https://github.com/ZiweiWangTHU/BiDet
**Harmonizing Transferability and Discriminability for Adapting Object Detectors**
- 论文:https://arxiv.org/abs/2003.06297
- 代码:https://github.com/chaoqichen/HTCN
**CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection**
- 论文:https://arxiv.org/abs/2003.09119
- 代码:https://github.com/KiveeDong/CentripetalNet
**Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection**
- 论文:https://arxiv.org/abs/2003.11818
- 代码:https://github.com/ggjy/HitDet.pytorch
**EfficientDet: Scalable and Efficient Object Detection**
- 论文:https://arxiv.org/abs/1911.09070
- 代码:https://github.com/google/automl/tree/master/efficientdet
# 3D目标检测
**SESS: Self-Ensembling Semi-Supervised 3D Object Detection**
- 论文: https://arxiv.org/abs/1912.11803
- 代码:https://github.com/Na-Z/sess
**Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection**
- 论文: https://arxiv.org/abs/2006.04356
- 代码:https://github.com/dleam/Associate-3Ddet
**What You See is What You Get: Exploiting Visibility for 3D Object Detection**
- 主页:https://www.cs.cmu.edu/~peiyunh/wysiwyg/
- 论文:https://arxiv.org/abs/1912.04986
- 代码:https://github.com/peiyunh/wysiwyg
**Learning Depth-Guided Convolutions for Monocular 3D Object Detection**
- 论文:https://arxiv.org/abs/1912.04799
- 代码:https://github.com/dingmyu/D4LCN
**Structure Aware Single-stage 3D Object Detection from Point Cloud**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html
- 代码:https://github.com/skyhehe123/SA-SSD
**IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf
- 代码:https://github.com/swords123/IDA-3D
**Train in Germany, Test in The USA: Making 3D Object Detectors Generalize**
- 论文:https://arxiv.org/abs/2005.08139
- 代码:https://github.com/cxy1997/3D_adapt_auto_driving
**MLCVNet: Multi-Level Context VoteNet for 3D Object Detection**
- 论文:https://arxiv.org/abs/2004.05679
- 代码:https://github.com/NUAAXQ/MLCVNet
**3DSSD: Point-based 3D Single Stage Object Detector**
- CVPR 2020 Oral
- 论文:https://arxiv.org/abs/2002.10187
- 代码:https://github.com/tomztyang/3DSSD
**Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation**
- 论文:https://arxiv.org/abs/2004.03572
- 代码:https://github.com/zju3dv/disprcn
**End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection**
- 论文:https://arxiv.org/abs/2004.03080
- 代码:https://github.com/mileyan/pseudo-LiDAR_e2e
**DSGN: Deep Stereo Geometry Network for 3D Object Detection**
- 论文:https://arxiv.org/abs/2001.03398
- 代码:https://github.com/chenyilun95/DSGN
**LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention**
- 论文:https://arxiv.org/abs/2004.01389
- 代码:https://github.com/yinjunbo/3DVID
**PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection**
- 论文:https://arxiv.org/abs/1912.13192
- 代码:https://github.com/sshaoshuai/PV-RCNN
**Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud**
- 论文:https://arxiv.org/abs/2003.01251
- 代码:https://github.com/WeijingShi/Point-GNN
# 视频目标检测
**Memory Enhanced Global-Local Aggregation for Video Object Detection**
论文:https://arxiv.org/abs/2003.12063
代码:https://github.com/Scalsol/mega.pytorch
# 目标跟踪
**SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking**
- 论文:https://arxiv.org/abs/1911.07241
- 代码:https://github.com/ohhhyeahhh/SiamCAR
**D3S -- A Discriminative Single Shot Segmentation Tracker**
- 论文:https://arxiv.org/abs/1911.08862
- 代码:https://github.com/alanlukezic/d3s
**ROAM: Recurrently Optimizing Tracking Model**
- 论文:https://arxiv.org/abs/1907.12006
- 代码:https://github.com/skyoung/ROAM
**Siam R-CNN: Visual Tracking by Re-Detection**
- 主页:https://www.vision.rwth-aachen.de/page/siamrcnn
- 论文:https://arxiv.org/abs/1911.12836
- 论文2:https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf
- 代码:https://github.com/VisualComputingInstitute/SiamR-CNN
**Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises**
- 论文:https://arxiv.org/abs/2003.09595
- 代码:https://github.com/MasterBin-IIAU/CSA
**High-Performance Long-Term Tracking with Meta-Updater**
- 论文:https://arxiv.org/abs/2004.00305
- 代码:https://github.com/Daikenan/LTMU
**AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization**
- 论文:https://arxiv.org/abs/2003.12949
- 代码:https://github.com/vision4robotics/AutoTrack
**Probabilistic Regression for Visual Tracking**
- 论文:https://arxiv.org/abs/2003.12565
- 代码:https://github.com/visionml/pytracking
**MAST: A Memory-Augmented Self-supervised Tracker**
- 论文:https://arxiv.org/abs/2002.07793
- 代码:https://github.com/zlai0/MAST
**Siamese Box Adaptive Network for Visual Tracking**
- 论文:https://arxiv.org/abs/2003.06761
- 代码:https://github.com/hqucv/siamban
## 多目标跟踪
**3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**
- 主页:https://vap.aau.dk/3d-zef/
- 论文:https://arxiv.org/abs/2006.08466
- 代码:https://bitbucket.org/aauvap/3d-zef/src/master/
- 数据集:https://motchallenge.net/data/3D-ZeF20
# 语义分割
**FDA: Fourier Domain Adaptation for Semantic Segmentation**
- 论文:https://arxiv.org/abs/2004.05498
- 代码:https://github.com/YanchaoYang/FDA
**Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation**
- 论文:暂无
- 代码:https://github.com/JianqiangWan/Super-BPD
**Single-Stage Semantic Segmentation from Image Labels**
- 论文:https://arxiv.org/abs/2005.08104
- 代码:https://github.com/visinf/1-stage-wseg
**Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation**
- 论文:https://arxiv.org/abs/2003.00867
- 代码:https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation
**MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**
- 论文:http://vladlen.info/papers/MSeg.pdf
- 代码:https://github.com/mseg-dataset/mseg-api
**CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement**
- 论文:https://arxiv.org/abs/2005.02551
- 代码:https://github.com/hkchengrex/CascadePSP
**Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision**
- Oral
- 论文:https://arxiv.org/abs/2004.07703
- 代码:https://github.com/feipan664/IntraDA
**Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation**
- 论文:https://arxiv.org/abs/2004.04581
- 代码:https://github.com/YudeWang/SEAM
**Temporally Distributed Networks for Fast Video Segmentation**
- 论文:https://arxiv.org/abs/2004.01800
- 代码:https://github.com/feinanshan/TDNet
**Context Prior for Scene Segmentation**
- 论文:https://arxiv.org/abs/2004.01547
- 代码:https://git.io/ContextPrior
**Strip Pooling: Rethinking Spatial Pooling for Scene Parsing**
- 论文:https://arxiv.org/abs/2003.13328
- 代码:https://github.com/Andrew-Qibin/SPNet
**Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks**
- 论文:https://arxiv.org/abs/2003.05128
- 代码:https://github.com/shachoi/HANet
**Learning Dynamic Routing for Semantic Segmentation**
- 论文:https://arxiv.org/abs/2003.10401
- 代码:https://github.com/yanwei-li/DynamicRouting
# 实例分割
**D2Det: Towards High Quality Object Detection and Instance Segmentation**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
- 代码:https://github.com/JialeCao001/D2Det
**PolarMask: Single Shot Instance Segmentation with Polar Representation**
- 论文:https://arxiv.org/abs/1909.13226
- 代码:https://github.com/xieenze/PolarMask
- 解读:https://zhuanlan.zhihu.com/p/84890413
**CenterMask : Real-Time Anchor-Free Instance Segmentation**
- 论文:https://arxiv.org/abs/1911.06667
- 代码:https://github.com/youngwanLEE/CenterMask
**BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation**
- 论文:https://arxiv.org/abs/2001.00309
- 代码:https://github.com/aim-uofa/AdelaiDet
**Deep Snake for Real-Time Instance Segmentation**
- 论文:https://arxiv.org/abs/2001.01629
- 代码:https://github.com/zju3dv/snake
**Mask Encoding for Single Shot Instance Segmentation**
- 论文:https://arxiv.org/abs/2003.11712
- 代码:https://github.com/aim-uofa/AdelaiDet
# 全景分割
**Video Panoptic Segmentation**
- 论文:https://arxiv.org/abs/2006.11339
- 代码:https://github.com/mcahny/vps
- 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
**Pixel Consensus Voting for Panoptic Segmentation**
- 论文:https://arxiv.org/abs/2004.01849
- 代码:还未公布
**BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation**
论文:https://arxiv.org/abs/2003.14031
代码:https://github.com/Mooonside/BANet
# 视频目标分割
**A Transductive Approach for Video Object Segmentation**
- 论文:https://arxiv.org/abs/2004.07193
- 代码:https://github.com/microsoft/transductive-vos.pytorch
**State-Aware Tracker for Real-Time Video Object Segmentation**
- 论文:https://arxiv.org/abs/2003.00482
- 代码:https://github.com/MegviiDetection/video_analyst
**Learning Fast and Robust Target Models for Video Object Segmentation**
- 论文:https://arxiv.org/abs/2003.00908
- 代码:https://github.com/andr345/frtm-vos
**Learning Video Object Segmentation from Unlabeled Videos**
- 论文:https://arxiv.org/abs/2003.05020
- 代码:https://github.com/carrierlxk/MuG
# 超像素分割
**Superpixel Segmentation with Fully Convolutional Networks**
- 论文:https://arxiv.org/abs/2003.12929
- 代码:https://github.com/fuy34/superpixel_fcn
# 交互式图像分割
**Interactive Object Segmentation with Inside-Outside Guidance**
- 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
- 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance
- 数据集:https://github.com/shiyinzhang/Pixel-ImageNet
# NAS
**AOWS: Adaptive and optimal network width search with latency constraints**
- 论文:https://arxiv.org/abs/2005.10481
- 代码:https://github.com/bermanmaxim/AOWS
**Densely Connected Search Space for More Flexible Neural Architecture Search**
- 论文:https://arxiv.org/abs/1906.09607
- 代码:https://github.com/JaminFong/DenseNAS
**MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning**
- 论文:https://arxiv.org/abs/2003.14058
- 代码:https://github.com/bhpfelix/MTLNAS
**FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions**
- 论文下载链接:https://arxiv.org/abs/2004.05565
- 代码:https://github.com/facebookresearch/mobile-vision
**Neural Architecture Search for Lightweight Non-Local Networks**
- 论文:https://arxiv.org/abs/2004.01961
- 代码:https://github.com/LiYingwei/AutoNL
**Rethinking Performance Estimation in Neural Architecture Search**
- 论文:https://arxiv.org/abs/2005.09917
- 代码:https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS
- 解读1:https://www.zhihu.com/question/372070853/answer/1035234510
- 解读2:https://zhuanlan.zhihu.com/p/111167409
**CARS: Continuous Evolution for Efficient Neural Architecture Search**
- 论文:https://arxiv.org/abs/1909.04977
- 代码(即将开源):https://github.com/huawei-noah/CARS
# GAN
**SEAN: Image Synthesis with Semantic Region-Adaptive Normalization**
- 论文:https://arxiv.org/abs/1911.12861
- 代码:https://github.com/ZPdesu/SEAN
**Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation**
- 论文地址:http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html
- 代码地址:https://github.com/alpc91/NICE-GAN-pytorch
**Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning**
- 论文:https://arxiv.org/abs/1912.01899
- 代码:https://github.com/SsGood/DBGAN
**PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer**
- 论文:https://arxiv.org/abs/1909.06956
- 代码:https://github.com/wtjiang98/PSGAN
**Semantically Mutil-modal Image Synthesis**
- 主页:http://seanseattle.github.io/SMIS
- 论文:https://arxiv.org/abs/2003.12697
- 代码:https://github.com/Seanseattle/SMIS
**Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping**
- 论文:https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf
- 代码:https://github.com/yiranran/Unpaired-Portrait-Drawing
**Learning to Cartoonize Using White-box Cartoon Representations**
- 论文:https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf
- 主页:https://systemerrorwang.github.io/White-box-Cartoonization/
- 代码:https://github.com/SystemErrorWang/White-box-Cartoonization
- 解读:https://zhuanlan.zhihu.com/p/117422157
- Demo视频:https://www.bilibili.com/video/av56708333
**GAN Compression: Efficient Architectures for Interactive Conditional GANs**
- 论文:https://arxiv.org/abs/2003.08936
- 代码:https://github.com/mit-han-lab/gan-compression
**Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions**
- 论文:https://arxiv.org/abs/2003.01826
- 代码:https://github.com/cc-hpc-itwm/UpConv
# Re-ID
**High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html
- 代码:https://github.com/wangguanan/HOReID
**COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**
- 论文:https://arxiv.org/abs/2005.07862
- 数据集:暂无
**Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking**
- 论文:https://arxiv.org/abs/2004.04199
- 代码:https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking
**Pose-guided Visible Part Matching for Occluded Person ReID**
- 论文:https://arxiv.org/abs/2004.00230
- 代码:https://github.com/hh23333/PVPM
**Weakly supervised discriminative feature learning with state information for person identification**
- 论文:https://arxiv.org/abs/2002.11939
- 代码:https://github.com/KovenYu/state-information
# 3D点云(分类/分割/配准等)
## 3D点云卷积
**PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling**
- 论文:https://arxiv.org/abs/2003.00492
- 代码:https://github.com/yanx27/PointASNL
**Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds**
- 论文下载链接:https://arxiv.org/abs/2003.12971
- 代码:https://github.com/raoyongming/PointGLR
**Grid-GCN for Fast and Scalable Point Cloud Learning**
- 论文:https://arxiv.org/abs/1912.02984
- 代码:https://github.com/Xharlie/Grid-GCN
**FPConv: Learning Local Flattening for Point Convolution**
- 论文:https://arxiv.org/abs/2002.10701
- 代码:https://github.com/lyqun/FPConv
## 3D点云分类
**PointAugment: an Auto-Augmentation Framework for Point Cloud Classification**
- 论文:https://arxiv.org/abs/2002.10876
- 代码(即将开源): https://github.com/liruihui/PointAugment/
## 3D点云语义分割
**RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds**
- 论文:https://arxiv.org/abs/1911.11236
- 代码:https://github.com/QingyongHu/RandLA-Net
- 解读:https://zhuanlan.zhihu.com/p/105433460
**Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels**
- 论文:https://arxiv.org/abs/2004.04091
- 代码:https://github.com/alex-xun-xu/WeakSupPointCloudSeg
**PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation**
- 论文:https://arxiv.org/abs/2003.14032
- 代码:https://github.com/edwardzhou130/PolarSeg
**Learning to Segment 3D Point Clouds in 2D Image Space**
- 论文:https://arxiv.org/abs/2003.05593
- 代码:https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space
## 3D点云实例分割
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
- 论文:https://arxiv.org/abs/2004.01658
- 代码:https://github.com/Jia-Research-Lab/PointGroup
## 3D点云配准
**Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences**
- 论文:https://arxiv.org/abs/2005.01014
- 代码:https://github.com/XiaoshuiHuang/fmr
**D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features**
- 论文:https://arxiv.org/abs/2003.03164
- 代码:https://github.com/XuyangBai/D3Feat
**RPM-Net: Robust Point Matching using Learned Features**
- 论文:https://arxiv.org/abs/2003.13479
- 代码:https://github.com/yewzijian/RPMNet
## 3D点云补全
**Cascaded Refinement Network for Point Cloud Completion**
- 论文:https://arxiv.org/abs/2004.03327
- 代码:https://github.com/xiaogangw/cascaded-point-completion
## 3D点云目标跟踪
**P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds**
- 论文:https://arxiv.org/abs/2005.13888
- 代码:https://github.com/HaozheQi/P2B
## 其他
**An Efficient PointLSTM for Point Clouds Based Gesture Recognition**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
- 代码:https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch
# 人脸
## 人脸识别
**CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition**
- 论文:https://arxiv.org/abs/2004.00288
- 代码:https://github.com/HuangYG123/CurricularFace
**Learning Meta Face Recognition in Unseen Domains**
- 论文:https://arxiv.org/abs/2003.07733
- 代码:https://github.com/cleardusk/MFR
- 解读:https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ
## 人脸检测
## 人脸活体检测
**Searching Central Difference Convolutional Networks for Face Anti-Spoofing**
- 论文:https://arxiv.org/abs/2003.04092
- 代码:https://github.com/ZitongYu/CDCN
## 人脸表情识别
**Suppressing Uncertainties for Large-Scale Facial Expression Recognition**
- 论文:https://arxiv.org/abs/2002.10392
- 代码(即将开源):https://github.com/kaiwang960112/Self-Cure-Network
## 人脸转正
**Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images**
- 论文:https://arxiv.org/abs/2003.08124
- 代码:https://github.com/Hangz-nju-cuhk/Rotate-and-Render
## 人脸3D重建
**AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**
- 论文:https://arxiv.org/abs/2003.13845
- 数据集:https://github.com/lattas/AvatarMe
**FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**
- 论文:https://arxiv.org/abs/2003.13989
- 代码:https://github.com/zhuhao-nju/facescape
# 人体姿态估计(2D/3D)
## 2D人体姿态估计
**TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting**
- 主页:https://yzhq97.github.io/transmomo/
- 论文:https://arxiv.org/abs/2003.14401
- 代码:https://github.com/yzhq97/transmomo.pytorch
**HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation**
- 论文:https://arxiv.org/abs/1908.10357
- 代码:https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation
**The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation**
- 论文:https://arxiv.org/abs/1911.07524
- 代码:https://github.com/HuangJunJie2017/UDP-Pose
- 解读:https://zhuanlan.zhihu.com/p/92525039
**Distribution-Aware Coordinate Representation for Human Pose Estimation**
- 主页:https://ilovepose.github.io/coco/
- 论文:https://arxiv.org/abs/1910.06278
- 代码:https://github.com/ilovepose/DarkPose
## 3D人体姿态估计
**Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data**
- 论文:https://arxiv.org/abs/2006.07778
- 代码:https://github.com/Nicholasli1995/EvoSkeleton
**Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach**
- 主页:https://www.zhe-zhang.com/cvpr2020
- 论文:https://arxiv.org/abs/2003.11163
- 代码:https://github.com/CHUNYUWANG/imu-human-pose-pytorch
**Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**
- 论文下载链接:https://arxiv.org/abs/2004.01166
- 代码:https://github.com/Healthcare-Robotics/bodies-at-rest
- 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
**Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis**
- 主页:http://val.cds.iisc.ac.in/pgp-human/
- 论文:https://arxiv.org/abs/2004.04400
**Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation**
- 论文:https://arxiv.org/abs/2004.00329
- 代码:https://github.com/fabbrimatteo/LoCO
**VIBE: Video Inference for Human Body Pose and Shape Estimation**
- 论文:https://arxiv.org/abs/1912.05656
- 代码:https://github.com/mkocabas/VIBE
**Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation**
- 论文:https://arxiv.org/abs/2002.11251
- 代码:https://github.com/vnmr/JointVideoPose3D
**Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**
- 论文:https://arxiv.org/abs/2003.03972
- 数据集:暂无
# 人体解析
**Correlating Edge, Pose with Parsing**
- 论文:https://arxiv.org/abs/2005.01431
- 代码:https://github.com/ziwei-zh/CorrPM
# 场景文本检测
**STEFANN: Scene Text Editor using Font Adaptive Neural Network**
- 主页:https://prasunroy.github.io/stefann/
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
- 代码:https://github.com/prasunroy/stefann
- 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
**ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf
- 代码:https://github.com/wangyuxin87/ContourNet
**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
- 论文:https://arxiv.org/abs/2003.10608
- 代码和数据集:https://github.com/Jyouhou/UnrealText/
**ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**
- 论文:https://arxiv.org/abs/2002.10200
- 代码(即将开源):https://github.com/Yuliang-Liu/bezier_curve_text_spotting
- 代码(即将开源):https://github.com/aim-uofa/adet
**Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection**
- 论文:https://arxiv.org/abs/2003.07493
- 代码:https://github.com/GXYM/DRRG
# 场景文本识别
**SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition**
- 论文:https://arxiv.org/abs/2005.10977
- 代码:https://github.com/Pay20Y/SEED
**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
- 论文:https://arxiv.org/abs/2003.10608
- 代码和数据集:https://github.com/Jyouhou/UnrealText/
**ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**
- 论文:https://arxiv.org/abs/2002.10200
- 代码(即将开源):https://github.com/aim-uofa/adet
**Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition**
- 论文:https://arxiv.org/abs/2003.06606
- 代码:https://github.com/Canjie-Luo/Text-Image-Augmentation
# 特征(点)检测和描述
**SuperGlue: Learning Feature Matching with Graph Neural Networks**
- 论文:https://arxiv.org/abs/1911.11763
- 代码:https://github.com/magicleap/SuperGluePretrainedNetwork
# 超分辨率
## 图像超分辨率
**Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html
- 代码:https://github.com/guoyongcs/DRN
**Learning Texture Transformer Network for Image Super-Resolution**
- 论文:https://arxiv.org/abs/2006.04139
- 代码:https://github.com/FuzhiYang/TTSR
**Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining**
- 论文:https://arxiv.org/abs/2006.01424
- 代码:https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention
**Structure-Preserving Super Resolution with Gradient Guidance**
- 论文:https://arxiv.org/abs/2003.13081
- 代码:https://github.com/Maclory/SPSR
**Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy**
论文:https://arxiv.org/abs/2004.00448
代码:https://github.com/clovaai/cutblur
## 视频超分辨率
**TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution**
- 论文:https://arxiv.org/abs/1812.02898
- 代码:https://github.com/YapengTian/TDAN-VSR-CVPR-2020
**Space-Time-Aware Multi-Resolution Video Enhancement**
- 主页:https://alterzero.github.io/projects/STAR.html
- 论文:http://arxiv.org/abs/2003.13170
- 代码:https://github.com/alterzero/STARnet
**Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**
- 论文:https://arxiv.org/abs/2002.11616
- 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
# 模型压缩/剪枝
**DMCP: Differentiable Markov Channel Pruning for Neural Networks**
- 论文:https://arxiv.org/abs/2005.03354
- 代码:https://github.com/zx55/dmcp
**Forward and Backward Information Retention for Accurate Binary Neural Networks**
- 论文:https://arxiv.org/abs/1909.10788
- 代码:https://github.com/htqin/IR-Net
**Towards Efficient Model Compression via Learned Global Ranking**
- 论文:https://arxiv.org/abs/1904.12368
- 代码:https://github.com/cmu-enyac/LeGR
**HRank: Filter Pruning using High-Rank Feature Map**
- 论文:http://arxiv.org/abs/2002.10179
- 代码:https://github.com/lmbxmu/HRank
**GAN Compression: Efficient Architectures for Interactive Conditional GANs**
- 论文:https://arxiv.org/abs/2003.08936
- 代码:https://github.com/mit-han-lab/gan-compression
**Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression**
- 论文:https://arxiv.org/abs/2003.08935
- 代码:https://github.com/ofsoundof/group_sparsity
# 视频理解/行为识别
**Oops! Predicting Unintentional Action in Video**
- 主页:https://oops.cs.columbia.edu/
- 论文:https://arxiv.org/abs/1911.11206
- 代码:https://github.com/cvlab-columbia/oops
- 数据集:https://oops.cs.columbia.edu/data
**PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition**
- 论文:https://arxiv.org/abs/1911.12409
- 代码:https://github.com/shlizee/Predict-Cluster
**Intra- and Inter-Action Understanding via Temporal Action Parsing**
- 论文:https://arxiv.org/abs/2005.10229
- 主页和数据集:https://sdolivia.github.io/TAPOS/
**3DV: 3D Dynamic Voxel for Action Recognition in Depth Video**
- 论文:https://arxiv.org/abs/2005.05501
- 代码:https://github.com/3huo/3DV-Action
**FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**
- 主页:https://sdolivia.github.io/FineGym/
- 论文:https://arxiv.org/abs/2004.06704
**TEA: Temporal Excitation and Aggregation for Action Recognition**
- 论文:https://arxiv.org/abs/2004.01398
- 代码:https://github.com/Phoenix1327/tea-action-recognition
**X3D: Expanding Architectures for Efficient Video Recognition**
- 论文:https://arxiv.org/abs/2004.04730
- 代码:https://github.com/facebookresearch/SlowFast
**Temporal Pyramid Network for Action Recognition**
- 主页:https://decisionforce.github.io/TPN
- 论文:https://arxiv.org/abs/2004.03548
- 代码:https://github.com/decisionforce/TPN
## 基于骨架的动作识别
**Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition**
- 论文:https://arxiv.org/abs/2003.14111
- 代码:https://github.com/kenziyuliu/ms-g3d
# 人群计数
# 深度估计
**BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf
- 代码:https://github.com/Yeh-yu-hsuan/BiFuse
**Focus on defocus: bridging the synthetic to real domain gap for depth estimation**
- 论文:https://arxiv.org/abs/2005.09623
- 代码:https://github.com/dvl-tum/defocus-net
**Bi3D: Stereo Depth Estimation via Binary Classifications**
- 论文:https://arxiv.org/abs/2005.07274
- 代码:https://github.com/NVlabs/Bi3D
**AANet: Adaptive Aggregation Network for Efficient Stereo Matching**
- 论文:https://arxiv.org/abs/2004.09548
- 代码:https://github.com/haofeixu/aanet
**Towards Better Generalization: Joint Depth-Pose Learning without PoseNet**
- 论文:https://github.com/B1ueber2y/TrianFlow
- 代码:https://github.com/B1ueber2y/TrianFlow
## 单目深度估计
**On the uncertainty of self-supervised monocular depth estimation**
- 论文:https://arxiv.org/abs/2005.06209
- 代码:https://github.com/mattpoggi/mono-uncertainty
**3D Packing for Self-Supervised Monocular Depth Estimation**
- 论文:https://arxiv.org/abs/1905.02693
- 代码:https://github.com/TRI-ML/packnet-sfm
- Demo视频:https://www.bilibili.com/video/av70562892/
**Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation**
- 论文:https://arxiv.org/abs/2002.12114
- 代码:https://github.com/yzhao520/ARC
# 6D目标姿态估计
**PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf
- 代码:https://github.com/ethnhe/PVN3D
**MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion**
- 论文:https://arxiv.org/abs/2004.04336
- 代码:https://github.com/wkentaro/morefusion
**EPOS: Estimating 6D Pose of Objects with Symmetries**
主页:http://cmp.felk.cvut.cz/epos
论文:https://arxiv.org/abs/2004.00605
**G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features**
- 论文:https://arxiv.org/abs/2003.11089
- 代码:https://github.com/DC1991/G2L_Net
# 手势估计
**HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation**
- 论文:https://arxiv.org/abs/2004.00060
- 主页:http://vision.sice.indiana.edu/projects/hopenet
**Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data**
- 论文:https://arxiv.org/abs/2003.09572
- 代码:https://github.com/CalciferZh/minimal-hand
# 显著性检测
**JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection**
- 论文:https://arxiv.org/abs/2004.08515
- 代码:https://github.com/kerenfu/JLDCF/
**UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders**
- 主页:http://dpfan.net/d3netbenchmark/
- 论文:https://arxiv.org/abs/2004.05763
- 代码:https://github.com/JingZhang617/UCNet
# 去噪
**A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising**
- 论文:https://arxiv.org/abs/2003.12751
- 代码:https://github.com/Vandermode/NoiseModel
**CycleISP: Real Image Restoration via Improved Data Synthesis**
- 论文:https://arxiv.org/abs/2003.07761
- 代码:https://github.com/swz30/CycleISP
# 去雨
**Multi-Scale Progressive Fusion Network for Single Image Deraining**
- 论文:https://arxiv.org/abs/2003.10985
- 代码:https://github.com/kuihua/MSPFN
**Detail-recovery Image Deraining via Context Aggregation Networks**
- 论文:https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html
- 代码:https://github.com/Dengsgithub/DRD-Net
# 去模糊
## 视频去模糊
**Cascaded Deep Video Deblurring Using Temporal Sharpness Prior**
- 主页:https://csbhr.github.io/projects/cdvd-tsp/index.html
- 论文:https://arxiv.org/abs/2004.02501
- 代码:https://github.com/csbhr/CDVD-TSP
# 去雾
**Domain Adaptation for Image Dehazing**
- 论文:https://arxiv.org/abs/2005.04668
- 代码:https://github.com/HUSTSYJ/DA_dahazing
**Multi-Scale Boosted Dehazing Network with Dense Feature Fusion**
- 论文:https://arxiv.org/abs/2004.13388
- 代码:https://github.com/BookerDeWitt/MSBDN-DFF
# 特征点检测与描述
**ASLFeat: Learning Local Features of Accurate Shape and Localization**
- 论文:https://arxiv.org/abs/2003.10071
- 代码:https://github.com/lzx551402/aslfeat
# 视觉问答(VQA)
**VC R-CNN:Visual Commonsense R-CNN**
- 论文:https://arxiv.org/abs/2002.12204
- 代码:https://github.com/Wangt-CN/VC-R-CNN
# 视频问答(VideoQA)
**Hierarchical Conditional Relation Networks for Video Question Answering**
- 论文:https://arxiv.org/abs/2002.10698
- 代码:https://github.com/thaolmk54/hcrn-videoqa
# 视觉语言导航
**Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training**
- 论文:https://arxiv.org/abs/2002.10638
- 代码(即将开源):https://github.com/weituo12321/PREVALENT
# 视频压缩
**Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement**
- 论文:https://arxiv.org/abs/2003.01966
- 代码:https://github.com/RenYang-home/HLVC
# 视频插帧
**AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation**
- 论文:https://arxiv.org/abs/1907.10244
- 代码:https://github.com/HyeongminLEE/AdaCoF-pytorch
**FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html
- 代码:https://github.com/CM-BF/FeatureFlow
**Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**
- 论文:https://arxiv.org/abs/2002.11616
- 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
**Space-Time-Aware Multi-Resolution Video Enhancement**
- 主页:https://alterzero.github.io/projects/STAR.html
- 论文:http://arxiv.org/abs/2003.13170
- 代码:https://github.com/alterzero/STARnet
**Scene-Adaptive Video Frame Interpolation via Meta-Learning**
- 论文:https://arxiv.org/abs/2004.00779
- 代码:https://github.com/myungsub/meta-interpolation
**Softmax Splatting for Video Frame Interpolation**
- 主页:http://sniklaus.com/papers/softsplat
- 论文:https://arxiv.org/abs/2003.05534
- 代码:https://github.com/sniklaus/softmax-splatting
# 风格迁移
**Diversified Arbitrary Style Transfer via Deep Feature Perturbation**
- 论文:https://arxiv.org/abs/1909.08223
- 代码:https://github.com/EndyWon/Deep-Feature-Perturbation
**Collaborative Distillation for Ultra-Resolution Universal Style Transfer**
- 论文:https://arxiv.org/abs/2003.08436
- 代码:https://github.com/mingsun-tse/collaborative-distillation
# 车道线检测
**Inter-Region Affinity Distillation for Road Marking Segmentation**
- 论文:https://arxiv.org/abs/2004.05304
- 代码:https://github.com/cardwing/Codes-for-IntRA-KD
# "人-物"交互(HOT)检测
**PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection**
- 论文:https://arxiv.org/abs/1912.12898
- 代码:https://github.com/YueLiao/PPDM
**Detailed 2D-3D Joint Representation for Human-Object Interaction**
- 论文:https://arxiv.org/abs/2004.08154
- 代码:https://github.com/DirtyHarryLYL/DJ-RN
**Cascaded Human-Object Interaction Recognition**
- 论文:https://arxiv.org/abs/2003.04262
- 代码:https://github.com/tfzhou/C-HOI
**VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions**
- 论文:https://arxiv.org/abs/2003.05541
- 代码:https://github.com/ASMIftekhar/VSGNet
# 轨迹预测
**The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**
- 论文:https://arxiv.org/abs/1912.06445
- 代码:https://github.com/JunweiLiang/Multiverse
- 数据集:https://next.cs.cmu.edu/multiverse/
**Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction**
- 论文:https://arxiv.org/abs/2002.11927
- 代码:https://github.com/abduallahmohamed/Social-STGCNN
# 运动预测
**Collaborative Motion Prediction via Neural Motion Message Passing**
- 论文:https://arxiv.org/abs/2003.06594
- 代码:https://github.com/PhyllisH/NMMP
**MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps**
- 论文:https://arxiv.org/abs/2003.06754
- 代码:https://github.com/pxiangwu/MotionNet
# 光流估计
**Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation**
- 论文:https://arxiv.org/abs/2003.13045
- 代码:https://github.com/lliuz/ARFlow
# 图像检索
**Evade Deep Image Retrieval by Stashing Private Images in the Hash Space**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html
- 代码:https://github.com/sugarruy/hashstash
# 虚拟试衣
**Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content**
- 论文:https://arxiv.org/abs/2003.05863
- 代码:https://github.com/switchablenorms/DeepFashion_Try_On
# HDR
**Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline**
- 主页:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR
- 论文下载链接:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf
- 代码:https://github.com/alex04072000/SingleHDR
# 对抗样本
**Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction**
- 论文:https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf
- 代码:https://github.com/erbloo/dr_cvpr20
**Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance**
- 论文:https://arxiv.org/abs/1911.02466
- 代码:https://github.com/ZhengyuZhao/PerC-Adversarial
# 三维重建
**Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild**
- **CVPR 2020 Best Paper**
- 主页:https://elliottwu.com/projects/unsup3d/
- 论文:https://arxiv.org/abs/1911.11130
- 代码:https://github.com/elliottwu/unsup3d
**Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization**
- 主页:https://shunsukesaito.github.io/PIFuHD/
- 论文:https://arxiv.org/abs/2004.00452
- 代码:https://github.com/facebookresearch/pifuhd
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
- 代码:https://github.com/chaitanya100100/TailorNet
- 数据集:https://github.com/zycliao/TailorNet_dataset
**Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf
- 代码:https://github.com/jchibane/if-net
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf
- 代码:https://github.com/aymenmir1/pix2surf
# 深度补全
**Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End**
论文:https://arxiv.org/abs/2006.03349
代码:https://github.com/abdo-eldesokey/pncnn
# 语义场景补全
**3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior**
- 论文:https://arxiv.org/abs/2003.14052
- 代码:https://github.com/charlesCXK/TorchSSC
# 图像/视频描述
**Syntax-Aware Action Targeting for Video Captioning**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf
- 代码:https://github.com/SydCaption/SAAT
# 线框解析
**Holistically-Attracted Wireframe Parser**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html
- 代码:https://github.com/cherubicXN/hawp
# 数据集
**OASIS: A Large-Scale Dataset for Single Image 3D in the Wild**
- 论文:https://arxiv.org/abs/2007.13215
- 数据集:https://oasis.cs.princeton.edu/
**STEFANN: Scene Text Editor using Font Adaptive Neural Network**
- 主页:https://prasunroy.github.io/stefann/
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
- 代码:https://github.com/prasunroy/stefann
- 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
**Interactive Object Segmentation with Inside-Outside Guidance**
- 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
- 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance
- 数据集:https://github.com/shiyinzhang/Pixel-ImageNet
**Video Panoptic Segmentation**
- 论文:https://arxiv.org/abs/2006.11339
- 代码:https://github.com/mcahny/vps
- 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
**FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html
- 代码:https://github.com/HKUSTCV/FSS-1000
- 数据集:https://github.com/HKUSTCV/FSS-1000
**3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**
- 主页:https://vap.aau.dk/3d-zef/
- 论文:https://arxiv.org/abs/2006.08466
- 代码:https://bitbucket.org/aauvap/3d-zef/src/master/
- 数据集:https://motchallenge.net/data/3D-ZeF20
**TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
- 代码:https://github.com/chaitanya100100/TailorNet
- 数据集:https://github.com/zycliao/TailorNet_dataset
**Oops! Predicting Unintentional Action in Video**
- 主页:https://oops.cs.columbia.edu/
- 论文:https://arxiv.org/abs/1911.11206
- 代码:https://github.com/cvlab-columbia/oops
- 数据集:https://oops.cs.columbia.edu/data
**The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**
- 论文:https://arxiv.org/abs/1912.06445
- 代码:https://github.com/JunweiLiang/Multiverse
- 数据集:https://next.cs.cmu.edu/multiverse/
**Open Compound Domain Adaptation**
- 主页:https://liuziwei7.github.io/projects/CompoundDomain.html
- 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
- 论文:https://arxiv.org/abs/1909.03403
- 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA
**Intra- and Inter-Action Understanding via Temporal Action Parsing**
- 论文:https://arxiv.org/abs/2005.10229
- 主页和数据集:https://sdolivia.github.io/TAPOS/
**Dynamic Refinement Network for Oriented and Densely Packed Object Detection**
- 论文下载链接:https://arxiv.org/abs/2005.09973
- 代码和数据集:https://github.com/Anymake/DRN_CVPR2020
**COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**
- 论文:https://arxiv.org/abs/2005.07862
- 数据集:暂无
**KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations**
- 论文:https://arxiv.org/abs/2002.12687
- 数据集:https://github.com/qq456cvb/KeypointNet
**MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**
- 论文:http://vladlen.info/papers/MSeg.pdf
- 代码:https://github.com/mseg-dataset/mseg-api
- 数据集:https://github.com/mseg-dataset/mseg-semantic
**AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**
- 论文:https://arxiv.org/abs/2003.13845
- 数据集:https://github.com/lattas/AvatarMe
**Learning to Autofocus**
- 论文:https://arxiv.org/abs/2004.12260
- 数据集:暂无
**FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**
- 论文:https://arxiv.org/abs/2003.13989
- 代码:https://github.com/zhuhao-nju/facescape
**Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**
- 论文下载链接:https://arxiv.org/abs/2004.01166
- 代码:https://github.com/Healthcare-Robotics/bodies-at-rest
- 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
**FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**
- 主页:https://sdolivia.github.io/FineGym/
- 论文:https://arxiv.org/abs/2004.06704
**A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**
- 主页:https://anyirao.com/projects/SceneSeg.html
- 论文下载链接:https://arxiv.org/abs/2004.02678
- 代码:https://github.com/AnyiRao/SceneSeg
**Deep Homography Estimation for Dynamic Scenes**
- 论文:https://arxiv.org/abs/2004.02132
- 数据集:https://github.com/lcmhoang/hmg-dynamics
**Assessing Image Quality Issues for Real-World Problems**
- 主页:https://vizwiz.org/tasks-and-datasets/image-quality-issues/
- 论文:https://arxiv.org/abs/2003.12511
**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
- 论文:https://arxiv.org/abs/2003.10608
- 代码和数据集:https://github.com/Jyouhou/UnrealText/
**PANDA: A Gigapixel-level Human-centric Video Dataset**
- 论文:https://arxiv.org/abs/2003.04852
- 数据集:http://www.panda-dataset.com/
**IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning**
- 论文:https://arxiv.org/abs/2003.02920
- 数据集:https://github.com/intra3d2019/IntrA
**Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**
- 论文:https://arxiv.org/abs/2003.03972
- 数据集:暂无
# 其他
**CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus**
- 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html
- 代码:https://github.com/fkluger/consac
**Learning to Learn Single Domain Generalization**
- 论文:https://arxiv.org/abs/2003.13216
- 代码:https://github.com/joffery/M-ADA
**Open Compound Domain Adaptation**
- 主页:https://liuziwei7.github.io/projects/CompoundDomain.html
- 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
- 论文:https://arxiv.org/abs/1909.03403
- 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA
**Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision**
- 论文:http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf
- 代码:https://github.com/autonomousvision/differentiable_volumetric_rendering
**QEBA: Query-Efficient Boundary-Based Blackbox Attack**
- 论文:https://arxiv.org/abs/2005.14137
- 代码:https://github.com/AI-secure/QEBA
**Equalization Loss for Long-Tailed Object Recognition**
- 论文:https://arxiv.org/abs/2003.05176
- 代码:https://github.com/tztztztztz/eql.detectron2
**Instance-aware Image Colorization**
- 主页:https://ericsujw.github.io/InstColorization/
- 论文:https://arxiv.org/abs/2005.10825
- 代码:https://github.com/ericsujw/InstColorization
**Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting**
- 论文:https://arxiv.org/abs/2005.09704
- 代码:https://github.com/Atlas200dk/sample-imageinpainting-HiFill
**Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching**
- 论文:https://arxiv.org/abs/2005.03860
- 代码:https://github.com/shiyujiao/cross_view_localization_DSM
**Epipolar Transformers**
- 论文:https://arxiv.org/abs/2005.04551
- 代码:https://github.com/yihui-he/epipolar-transformers
**Bringing Old Photos Back to Life**
- 主页:http://raywzy.com/Old_Photo/
- 论文:https://arxiv.org/abs/2004.09484
**MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask**
- 论文:https://arxiv.org/abs/2003.10955
- 代码:https://github.com/microsoft/MaskFlownet
**Self-Supervised Viewpoint Learning from Image Collections**
- 论文:https://arxiv.org/abs/2004.01793
- 论文2:https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf
- 代码:https://github.com/NVlabs/SSV
**Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations**
- Oral
- 论文:https://arxiv.org/abs/2003.12237
- 代码:https://github.com/cuishuhao/BNM
**Towards Learning Structure via Consensus for Face Segmentation and Parsing**
- 论文:https://arxiv.org/abs/1911.00957
- 代码:https://github.com/isi-vista/structure_via_consensus
**Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging**
- Oral
- 论文:https://arxiv.org/abs/2003.13654
- 代码:https://github.com/liuyang12/PnP-SCI
**Lightweight Photometric Stereo for Facial Details Recovery**
- 论文:https://arxiv.org/abs/2003.12307
- 代码:https://github.com/Juyong/FacePSNet
**Footprints and Free Space from a Single Color Image**
- 论文:https://arxiv.org/abs/2004.06376
- 代码:https://github.com/nianticlabs/footprints
**Self-Supervised Monocular Scene Flow Estimation**
- 论文:https://arxiv.org/abs/2004.04143
- 代码:https://github.com/visinf/self-mono-sf
**Quasi-Newton Solver for Robust Non-Rigid Registration**
- 论文:https://arxiv.org/abs/2004.04322
- 代码:https://github.com/Juyong/Fast_RNRR
**A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**
- 主页:https://anyirao.com/projects/SceneSeg.html
- 论文下载链接:https://arxiv.org/abs/2004.02678
- 代码:https://github.com/AnyiRao/SceneSeg
**DeepFLASH: An Efficient Network for Learning-based Medical Image Registration**
- 论文:https://arxiv.org/abs/2004.02097
- 代码:https://github.com/jw4hv/deepflash
**Self-Supervised Scene De-occlusion**
- 主页:https://xiaohangzhan.github.io/projects/deocclusion/
- 论文:https://arxiv.org/abs/2004.02788
- 代码:https://github.com/XiaohangZhan/deocclusion
**Polarized Reflection Removal with Perfect Alignment in the Wild**
- 主页:https://leichenyang.weebly.com/project-polarized.html
- 代码:https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment
**Background Matting: The World is Your Green Screen**
- 论文:https://arxiv.org/abs/2004.00626
- 代码:http://github.com/senguptaumd/Background-Matting
**What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective**
- 论文:https://arxiv.org/abs/2003.11241
- 代码:https://github.com/ZhangLi-CS/GCP_Optimization
**Look-into-Object: Self-supervised Structure Modeling for Object Recognition**
- 论文:暂无
- 代码:https://github.com/JDAI-CV/LIO
**Video Object Grounding using Semantic Roles in Language Description**
- 论文:https://arxiv.org/abs/2003.10606
- 代码:https://github.com/TheShadow29/vognet-pytorch
**Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives**
- 论文:https://arxiv.org/abs/2003.10739
- 代码:https://github.com/d-li14/DHM
**SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization**
- 论文:http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf
- 代码:https://github.com/YueJiang-nj/CVPR2020-SDFDiff
**On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location**
- 论文:https://arxiv.org/abs/2003.07064
- 代码:https://github.com/oskyhn/CNNs-Without-Borders
**GhostNet: More Features from Cheap Operations**
- 论文:https://arxiv.org/abs/1911.11907
- 代码:https://github.com/iamhankai/ghostnet
**AdderNet: Do We Really Need Multiplications in Deep Learning?**
- 论文:https://arxiv.org/abs/1912.13200
- 代码:https://github.com/huawei-noah/AdderNet
**Deep Image Harmonization via Domain Verification**
- 论文:https://arxiv.org/abs/1911.13239
- 代码:https://github.com/bcmi/Image_Harmonization_Datasets
**Blurry Video Frame Interpolation**
- 论文:https://arxiv.org/abs/2002.12259
- 代码:https://github.com/laomao0/BIN
**Extremely Dense Point Correspondences using a Learned Feature Descriptor**
- 论文:https://arxiv.org/abs/2003.00619
- 代码:https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch
**Filter Grafting for Deep Neural Networks**
- 论文:https://arxiv.org/abs/2001.05868
- 代码:https://github.com/fxmeng/filter-grafting
- 论文解读:https://www.zhihu.com/question/372070853/answer/1041569335
**Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation**
- 论文:https://arxiv.org/abs/2003.02824
- 代码:https://github.com/cmhungsteve/SSTDA
**Detecting Attended Visual Targets in Video**
- 论文:https://arxiv.org/abs/2003.02501
- 代码:https://github.com/ejcgt/attention-target-detection
**Deep Image Spatial Transformation for Person Image Generation**
- 论文:https://arxiv.org/abs/2003.00696
- 代码:https://github.com/RenYurui/Global-Flow-Local-Attention
**Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications**
- 论文:https://arxiv.org/abs/2003.01455
- 代码:https://github.com/bbrattoli/ZeroShotVideoClassification
https://github.com/charlesCXK/3D-SketchAware-SSC
https://github.com/Anonymous20192020/Anonymous_CVPR5767
https://github.com/avirambh/ScopeFlow
https://github.com/csbhr/CDVD-TSP
https://github.com/ymcidence/TBH
https://github.com/yaoyao-liu/mnemonics
https://github.com/meder411/Tangent-Images
https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch
https://github.com/sjmoran/deep_local_parametric_filters
https://github.com/charlesCXK/3D-SketchAware-SSC
https://github.com/bermanmaxim/AOWS
https://github.com/dc3ea9f/look-into-object
# 不确定中没中
**FADNet: A Fast and Accurate Network for Disparity Estimation**
- 论文:还没出来
- 代码:https://github.com/HKBU-HPML/FADNet
https://github.com/rFID-submit/RandomFID:不确定中没中
https://github.com/JackSyu/AE-MSR:不确定中没中
https://github.com/fastconvnets/cvpr2020:不确定中没中
https://github.com/aimagelab/meshed-memory-transformer:不确定中没中
https://github.com/TWSFar/CRGNet:不确定中没中
https://github.com/CVPR-2020/CDARTS:不确定中没中
https://github.com/anucvml/ddn-cvprw2020:不确定中没中
https://github.com/dl-model-recommend/model-trust:不确定中没中
https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior:不确定中没中
https://github.com/onetcvpr/O-Net:不确定中没中
https://github.com/502463708/Microcalcification_Detection:不确定中没中
https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine:不确定中没中
https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset:不确定中没中
https://github.com/cvpr-nonrigid/dataset:不确定中没中
https://github.com/theFool32/PPBA:不确定中没中
https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition
================================================
FILE: CVPR2021-Papers-with-Code.md
================================================
# CVPR 2021 论文和开源项目合集(Papers with Code)
[CVPR 2021](http://cvpr2021.thecvf.com/) 论文和开源项目合集(papers with code)!
CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt
> 注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

## 【CVPR 2021 论文开源目录】
- [Best Paper](#Best-Paper)
- [Backbone](#Backbone)
- [NAS](#NAS)
- [GAN](#GAN)
- [VAE](#VAE)
- [Visual Transformer](#Visual-Transformer)
- [Regularization](#Regularization)
- [SLAM](#SLAM)
- [长尾分布(Long-Tailed)](#Long-Tailed)
- [数据增广(Data Augmentation)](#DA)
- [无监督/自监督(Self-Supervised)](#Un/Self-Supervised)
- [半监督(Semi-Supervised)](#Semi-Supervised)
- [胶囊网络(Capsule Network)](#Capsule-Network)
- [图像分类(Image Classification](#Image-Classification)
- [2D目标检测(Object Detection)](#Object-Detection)
- [单/多目标跟踪(Object Tracking)](#Object-Tracking)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation)
- [视频目标分割(Video-Object-Segmentation)](#VOS)
- [交互式视频目标分割(Interactive-Video-Object-Segmentation)](#IVOS)
- [显著性检测(Saliency Detection)](#Saliency-Detection)
- [伪装物体检测(Camouflaged Object Detection)](#Camouflaged-Object-Detection)
- [协同显著性检测(Co-Salient Object Detection)](#CoSOD)
- [图像抠图(Image Matting)](#Matting)
- [行人重识别(Person Re-identification)](#Re-ID)
- [行人搜索(Person Search)](#Person-Search)
- [视频理解/行为识别(Video Understanding)](#Video-Understanding)
- [人脸识别(Face Recognition)](#Face-Recognition)
- [人脸检测(Face Detection)](#Face-Detection)
- [人脸活体检测(Face Anti-Spoofing)](#Face-Anti-Spoofing)
- [Deepfake检测(Deepfake Detection)](#Deepfake-Detection)
- [人脸年龄估计(Age-Estimation)](#Age-Estimation)
- [人脸表情识别(Facial-Expression-Recognition)](#FER)
- [Deepfakes](#Deepfakes)
- [人体解析(Human Parsing)](#Human-Parsing)
- [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation)
- [动物姿态估计(Animal Pose Estimation)](#Animal-Pose-Estimation)
- [手部姿态估计(Hand Pose Estimation)](#Hand-Pose-Estimation)
- [Human Volumetric Capture](#Human-Volumetric-Capture)
- [场景文本识别(Scene Text Recognition)](#Scene-Text-Recognition)
- [图像压缩(Image Compression)](#Image-Compression)
- [模型压缩/剪枝/量化](#Model-Compression)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [超分辨率(Super-Resolution)](#Super-Resolution)
- [去雾(Dehazing)](#Dehazing)
- [图像恢复(Image Restoration)](#Image-Restoration)
- [图像补全(Image Inpainting)](#Image-Inpainting)
- [图像编辑(Image Editing)](#Image-Editing)
- [图像描述(Image Captioning)](#Image-Captioning)
- [字体生成(Font Generation)](#Font-Generation)
- [图像匹配(Image Matching)](#Image-Matching)
- [图像融合(Image Blending)](#Image-Blending)
- [反光去除(Reflection Removal)](#Reflection-Removal)
- [3D点云分类(3D Point Clouds Classification)](#3D-C)
- [3D目标检测(3D Object Detection)](#3D-Object-Detection)
- [3D语义分割(3D Semantic Segmentation)](#3D-Semantic-Segmentation)
- [3D全景分割(3D Panoptic Segmentation)](#3D-Panoptic-Segmentation)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D点云配准(3D Point Cloud Registration)](#3D-PointCloud-Registration)
- [3D点云补全(3D-Point-Cloud-Completion)](#3D-Point-Cloud-Completion)
- [3D重建(3D Reconstruction)](#3D-Reconstruction)
- [6D位姿估计(6D Pose Estimation)](#6D-Pose-Estimation)
- [相机姿态估计(Camera Pose Estimation)](#Camera-Pose-Estimation)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [光流估计(Flow Estimation)](#Flow-Estimation)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction)
- [人群计数(Crowd Counting)](#Crowd-Counting)
- [对抗样本(Adversarial-Examples)](#AE)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [视频检索(Video Retrieval)](#Video-Retrieval)
- [跨模态检索(Cross-modal Retrieval)](#Cross-modal-Retrieval)
- [Zero-Shot Learning](#Zero-Shot-Learning)
- [联邦学习(Federated Learning)](#Federated-Learning)
- [视频插帧(Video Frame Interpolation)](#Video-Frame-Interpolation)
- [视觉推理(Visual Reasoning)](#Visual-Reasoning)
- [图像合成(Image Synthesis)](#Image-Synthesis)
- [视图合成(Visual Synthesis)](#Visual-Synthesis)
- [风格迁移(Style Transfer)](#Style-Transfer)
- [布局生成(Layout Generation)](#Layout-Generation)
- [Domain Generalization](#Domain-Generalization)
- [Domain Adaptation](#Domain-Adaptation)
- [Open-Set](#Open-Set)
- [Adversarial Attack](#Adversarial-Attack)
- ["人-物"交互(HOI)检测](#HOI)
- [阴影去除(Shadow Removal)](#Shadow-Removal)
- [虚拟试衣(Virtual Try-On)](#Virtual-Try-On)
- [标签噪声(Label Noise)](#Label-Noise)
- [视频稳像(Video Stabilization)](#Video-Stabilization)
- [数据集(Datasets)](#Datasets)
- [其他(Others)](#Others)
- [待添加(TODO)](#TO-DO)
- [不确定中没中(Not Sure)](#Not-Sure)
# Best Paper
**GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields**
- Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html
- Paper(Oral): https://arxiv.org/abs/2011.12100
- Code: https://github.com/autonomousvision/giraffe
- Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1
# Backbone
**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers**
- Paper(Oral): https://arxiv.org/abs/2106.06560
- Code: https://github.com/dingmyu/HR-NAS
**BCNet: Searching for Network Width with Bilaterally Coupled Network**
- Paper: https://arxiv.org/abs/2105.10533
- Code: None
**Decoupled Dynamic Filter Networks**
- Homepage: https://thefoxofsky.github.io/project_pages/ddf
- Paper: https://arxiv.org/abs/2104.14107
- Code: https://github.com/thefoxofsky/DDF
**Lite-HRNet: A Lightweight High-Resolution Network**
- Paper: https://arxiv.org/abs/2104.06403
- https://github.com/HRNet/Lite-HRNet
**CondenseNet V2: Sparse Feature Reactivation for Deep Networks**
- Paper: https://arxiv.org/abs/2104.04382
- Code: https://github.com/jianghaojun/CondenseNetV2
**Diverse Branch Block: Building a Convolution as an Inception-like Unit**
- Paper: https://arxiv.org/abs/2103.13425
- Code: https://github.com/DingXiaoH/DiverseBranchBlock
**Scaling Local Self-Attention For Parameter Efficient Visual Backbones**
- Paper(Oral): https://arxiv.org/abs/2103.12731
- Code: None
**ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network**
- Paper: https://arxiv.org/abs/2007.00992
- Code: https://github.com/clovaai/rexnet
**Involution: Inverting the Inherence of Convolution for Visual Recognition**
- Paper: https://github.com/d-li14/involution
- Code: https://arxiv.org/abs/2103.06255
**Coordinate Attention for Efficient Mobile Network Design**
- Paper: https://arxiv.org/abs/2103.02907
- Code: https://github.com/Andrew-Qibin/CoordAttention
**Inception Convolution with Efficient Dilation Search**
- Paper: https://arxiv.org/abs/2012.13587
- Code: https://github.com/yifan123/IC-Conv
**RepVGG: Making VGG-style ConvNets Great Again**
- Paper: https://arxiv.org/abs/2101.03697
- Code: https://github.com/DingXiaoH/RepVGG
# NAS
**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers**
- Paper(Oral): https://arxiv.org/abs/2106.06560
- Code: https://github.com/dingmyu/HR-NAS
**BCNet: Searching for Network Width with Bilaterally Coupled Network**
- Paper: https://arxiv.org/abs/2105.10533
- Code: None
**ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search**
- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None
**Combined Depth Space based Architecture Search For Person Re-identification**
- Paper: https://arxiv.org/abs/2104.04163
- Code: None
**DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation**
- Paper(Oral): https://arxiv.org/abs/2103.15954
- Code: None
**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers**
- Paper(Oral): None
- Code: https://github.com/dingmyu/HR-NAS
**Neural Architecture Search with Random Labels**
- Paper: https://arxiv.org/abs/2101.11834
- Code: None
**Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search**
- Paper: https://arxiv.org/abs/2101.11342
- Code: None
**Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation**
- Paper: https://arxiv.org/abs/2105.12971
- Code: None
**Prioritized Architecture Sampling with Monto-Carlo Tree Search**
- Paper: https://arxiv.org/abs/2103.11922
- Code: https://github.com/xiusu/NAS-Bench-Macro
**Contrastive Neural Architecture Search with Neural Architecture Comparators**
- Paper: https://arxiv.org/abs/2103.05471
- Code: https://github.com/chenyaofo/CTNAS
**AttentiveNAS: Improving Neural Architecture Search via Attentive**
- Paper: https://arxiv.org/abs/2011.09011
- Code: None
**ReNAS: Relativistic Evaluation of Neural Architecture Search**
- Paper: https://arxiv.org/abs/1910.01523
- Code: None
**HourNAS: Extremely Fast Neural Architecture**
- Paper: https://arxiv.org/abs/2005.14446
- Code: None
**Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator**
- Paper: https://arxiv.org/abs/2103.07289
- Code: https://github.com/eric8607242/SGNAS
**OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection**
- Paper: https://arxiv.org/abs/2103.04507
- Code: https://github.com/VDIGPKU/OPANAS
**Inception Convolution with Efficient Dilation Search**
- Paper: https://arxiv.org/abs/2012.13587
- Code: None
# GAN
**High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network**
- Paper: https://arxiv.org/abs/2105.09188
- Code: https://github.com/csjliang/LPTN
- Dataset: https://github.com/csjliang/LPTN
**DG-Font: Deformable Generative Networks for Unsupervised Font Generation**
- Paper: https://arxiv.org/abs/2104.03064
- Code: https://github.com/ecnuycxie/DG-Font
**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**
- Paper: https://arxiv.org/abs/2105.02201
- Code: https://github.com/KumapowerLIU/PD-GAN
**StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**
- Paper: https://arxiv.org/abs/2104.14754
- Code: https://github.com/naver-ai/StyleMapGAN
- Demo Video: https://youtu.be/qCapNyRA_Ng
**Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer**
- Paper: https://arxiv.org/abs/2104.05376
- Code: https://github.com/PaddlePaddle/PaddleGAN/
**Regularizing Generative Adversarial Networks under Limited Data**
- Homepage: https://hytseng0509.github.io/lecam-gan/
- Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf
- Code: https://github.com/google/lecam-gan
**Towards Real-World Blind Face Restoration with Generative Facial Prior**
- Paper: https://arxiv.org/abs/2101.04061
- Code: None
**TediGAN: Text-Guided Diverse Image Generation and Manipulation**
- Homepage: https://xiaweihao.com/projects/tedigan/
- Paper: https://arxiv.org/abs/2012.03308
- Code: https://github.com/weihaox/TediGAN
**Generative Hierarchical Features from Synthesizing Image**
- Homepage: https://genforce.github.io/ghfeat/
- Paper(Oral): https://arxiv.org/abs/2007.10379
- Code: https://github.com/genforce/ghfeat
**Teachers Do More Than Teach: Compressing Image-to-Image Models**
- Paper: https://arxiv.org/abs/2103.03467
- Code: https://github.com/snap-research/CAT
**HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms**
- Paper: https://arxiv.org/abs/2011.11731
- Code: https://github.com/mahmoudnafifi/HistoGAN
**pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis**
- Homepage: https://marcoamonteiro.github.io/pi-GAN-website/
- Paper(Oral): https://arxiv.org/abs/2012.00926
- Code: None
**DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network**
- Paper: https://arxiv.org/abs/2103.07893
- Code: None
**Diverse Semantic Image Synthesis via Probability Distribution Modeling**
- Paper: https://arxiv.org/abs/2103.06878
- Code: https://github.com/tzt101/INADE.git
**LOHO: Latent Optimization of Hairstyles via Orthogonalization**
- Paper: https://arxiv.org/abs/2103.03891
- Code: None
**PISE: Person Image Synthesis and Editing with Decoupled GAN**
- Paper: https://arxiv.org/abs/2103.04023
- Code: https://github.com/Zhangjinso/PISE
**DeFLOCNet: Deep Image Editing via Flexible Low-level Controls**
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
**Efficient Conditional GAN Transfer with Knowledge Propagation across Classes**
- Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes
- Code: http://github.com/mshahbazi72/cGANTransfer
**Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**
- Paper: None
- Code: None
**Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs**
- Paper: https://arxiv.org/abs/2011.14107
- Code: None
**Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation**
- Homepage: https://eladrich.github.io/pixel2style2pixel/
- Paper: https://arxiv.org/abs/2008.00951
- Code: https://github.com/eladrich/pixel2style2pixel
**A 3D GAN for Improved Large-pose Facial Recognition**
- Paper: https://arxiv.org/abs/2012.10545
- Code: None
**HumanGAN: A Generative Model of Humans Images**
- Paper: https://arxiv.org/abs/2103.06902
- Code: None
**ID-Unet: Iterative Soft and Hard Deformation for View Synthesis**
- Paper: https://arxiv.org/abs/2103.02264
- Code: https://github.com/MingyuY/Iterative-view-synthesis
**CoMoGAN: continuous model-guided image-to-image translation**
- Paper(Oral): https://arxiv.org/abs/2103.06879
- Code: https://github.com/cv-rits/CoMoGAN
**Training Generative Adversarial Networks in One Stage**
- Paper: https://arxiv.org/abs/2103.00430
- Code: None
**Closed-Form Factorization of Latent Semantics in GANs**
- Homepage: https://genforce.github.io/sefa/
- Paper(Oral): https://arxiv.org/abs/2007.06600
- Code: https://github.com/genforce/sefa
**Anycost GANs for Interactive Image Synthesis and Editing**
- Paper: https://arxiv.org/abs/2103.03243
- Code: https://github.com/mit-han-lab/anycost-gan
**Image-to-image Translation via Hierarchical Style Disentanglement**
- Paper: https://arxiv.org/abs/2103.01456
- Code: https://github.com/imlixinyang/HiSD
# VAE
**Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders**
- Homepage: https://taldatech.github.io/soft-intro-vae-web/
- Paper: https://arxiv.org/abs/2012.13253
- Code: https://github.com/taldatech/soft-intro-vae-pytorch
# Visual Transformer
**1. End-to-End Human Pose and Mesh Reconstruction with Transformers**
- Paper: https://arxiv.org/abs/2012.09760
- Code: https://github.com/microsoft/MeshTransformer
**2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition**
- Paper: https://arxiv.org/abs/2101.06184
- Code: https://github.com/tobyperrett/trx
**3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain**
- Paper: https://arxiv.org/abs/2103.16110
- Code: https://github.com/mczhuge/Kaleido-BERT
**4. HOTR: End-to-End Human-Object Interaction Detection with Transformers**
- Paper: https://arxiv.org/abs/2104.13682
- Code: https://github.com/kakaobrain/HOTR
**5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving**
- Paper: https://arxiv.org/abs/2104.09224
- Code: https://github.com/autonomousvision/transfuser
**6. Pose Recognition with Cascade Transformers**
- Paper: https://arxiv.org/abs/2104.06976
- Code: https://github.com/mlpc-ucsd/PRTR
**7. Variational Transformer Networks for Layout Generation**
- Paper: https://arxiv.org/abs/2104.02416
- Code: None
**8. LoFTR: Detector-Free Local Feature Matching with Transformers**
- Homepage: https://zju3dv.github.io/loftr/
- Paper: https://arxiv.org/abs/2104.00680
- Code: https://github.com/zju3dv/LoFTR
**9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers**
- Paper: https://arxiv.org/abs/2012.15840
- Code: https://github.com/fudan-zvg/SETR
**10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers**
- Paper: https://arxiv.org/abs/2103.16553
- Code: None
**11. Transformer Tracking**
- Paper: https://arxiv.org/abs/2103.15436
- Code: https://github.com/chenxin-dlut/TransT
**12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers**
- Paper(Oral): https://arxiv.org/abs/2106.06560
- Code: https://github.com/dingmyu/HR-NAS
**13. MIST: Multiple Instance Spatial Transformer**
- Paper: https://arxiv.org/abs/1811.10725
- Code: None
**14. Multimodal Motion Prediction with Stacked Transformers**
- Paper: https://arxiv.org/abs/2103.11624
- Code: https://decisionforce.github.io/mmTransformer
**15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning**
- Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
- Code: https://github.com/amzn/image-to-recipe-transformers
**16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking**
- Paper(Oral): https://arxiv.org/abs/2103.11681
- Code: https://github.com/594422814/TransformerTrack
**17. Pre-Trained Image Processing Transformer**
- Paper: https://arxiv.org/abs/2012.00364
- Code: None
**18. End-to-End Video Instance Segmentation with Transformers**
- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR
**19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers**
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
**20. End-to-End Human Object Interaction Detection with HOI Transformer**
- Paper: https://arxiv.org/abs/2103.04503
- Code: https://github.com/bbepoch/HoiTransformer
**21. Transformer Interpretability Beyond Attention Visualization**
- Paper: https://arxiv.org/abs/2012.09838
- Code: https://github.com/hila-chefer/Transformer-Explainability
**22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer**
- Paper: None
- Code: None
**23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity**
- Paper: None
- Code: None
**24. Line Segment Detection Using Transformers without Edges**
- Paper(Oral): https://arxiv.org/abs/2101.01909
- Code: None
**25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers**
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html
- Code: None
**26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation**
- Paper(Oral): https://arxiv.org/abs/2101.08833
- Code: https://github.com/dukebw/SSTVOS
**27. Facial Action Unit Detection With Transformers**
- Paper: None
- Code: None
**28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition**
- Paper: None
- Code: None
**29. Lesion-Aware Transformers for Diabetic Retinopathy Grading**
- Paper: None
- Code: None
**30. Topological Planning With Transformers for Vision-and-Language Navigation**
- Paper: https://arxiv.org/abs/2012.05292
- Code: None
**31. Adaptive Image Transformer for One-Shot Object Detection**
- Paper: None
- Code: None
**32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos**
- Paper: None
- Code: None
**33. Taming Transformers for High-Resolution Image Synthesis**
- Homepage: https://compvis.github.io/taming-transformers/
- Paper(Oral): https://arxiv.org/abs/2012.09841
- Code: https://github.com/CompVis/taming-transformers
**34. Self-Supervised Video Hashing via Bidirectional Transformers**
- Paper: None
- Code: None
**35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos**
- Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
- Code: None
**36. Gaussian Context Transformer**
- Paper: None
- Code: None
**37. General Multi-Label Image Classification With Transformers**
- Paper: https://arxiv.org/abs/2011.14027
- Code: None
**38. Bottleneck Transformers for Visual Recognition**
- Paper: https://arxiv.org/abs/2101.11605
- Code: None
**39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation**
- Paper(Oral): https://arxiv.org/abs/2011.13922
- Code: https://github.com/YicongHong/Recurrent-VLN-BERT
**40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling**
- Paper(Oral): https://arxiv.org/abs/2102.06183
- Code: https://github.com/jayleicn/ClipBERT
**41. Self-attention based Text Knowledge Mining for Text Detection**
- Paper: None
- Code: https://github.com/CVI-SZU/STKM
**42. SSAN: Separable Self-Attention Network for Video Representation Learning**
- Paper: None
- Code: None
**43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones**
- Paper(Oral): https://arxiv.org/abs/2103.12731
- Code: None
# Regularization
**Regularizing Neural Networks via Adversarial Model Perturbation**
- Paper: https://arxiv.org/abs/2010.04925
- Code: https://github.com/hiyouga/AMP-Regularizer
# SLAM
**Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation**
- Paper: https://arxiv.org/abs/2105.07593
- Code: None
**Generalizing to the Open World: Deep Visual Odometry with Online Adaptation**
- Paper: https://arxiv.org/abs/2103.15279
- Code: https://arxiv.org/abs/2103.15279
# 长尾分布(Long-Tailed)
**Adversarial Robustness under Long-Tailed Distribution**
- Paper(Oral): https://arxiv.org/abs/2104.02703
- Code: https://github.com/wutong16/Adversarial_Long-Tail
**Distribution Alignment: A Unified Framework for Long-tail Visual Recognition**
- Paper: https://arxiv.org/abs/2103.16370
- Code: https://github.com/Megvii-BaseDetection/DisAlign
**Adaptive Class Suppression Loss for Long-Tail Object Detection**
- Paper: https://arxiv.org/abs/2104.00885
- Code: https://github.com/CASIA-IVA-Lab/ACSL
**Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification**
- Paper: https://arxiv.org/abs/2103.14267
- Code: None
# 数据增广(Data Augmentation)
**Scale-aware Automatic Augmentation for Object Detection**
- Paper: https://arxiv.org/abs/2103.17220
- Code: https://github.com/Jia-Research-Lab/SA-AutoAug
# 无监督/自监督(Un/Self-Supervised)
**Domain-Specific Suppression for Adaptive Object Detection**
- Paper: https://arxiv.org/abs/2105.03570
- Code: None
**A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning**
- Paper: https://arxiv.org/abs/2104.14558
- Code: https://github.com/facebookresearch/SlowFast
**Unsupervised Multi-Source Domain Adaptation for Person Re-Identification**
- Paper: https://arxiv.org/abs/2104.12961
- Code: None
**Self-supervised Video Representation Learning by Context and Motion Decoupling**
- Paper: https://arxiv.org/abs/2104.00862
- Code: None
**Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning**
- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE
**Spatially Consistent Representation Learning**
- Paper: https://arxiv.org/abs/2103.06122
- Code: None
**VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples**
- Paper: https://arxiv.org/abs/2103.05905
- Code: https://github.com/tinapan-pt/VideoMoCo
**Exploring Simple Siamese Representation Learning**
- Paper(Oral): https://arxiv.org/abs/2011.10566
- Code: None
**Dense Contrastive Learning for Self-Supervised Visual Pre-Training**
- Paper(Oral): https://arxiv.org/abs/2011.09157
- Code: https://github.com/WXinlong/DenseCL
# 半监督学习(Semi-Supervised )
**Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework**
- 作者单位: 阿里巴巴
- Paper: https://arxiv.org/abs/2103.11402
- Code: None
**Adaptive Consistency Regularization for Semi-Supervised Transfer Learning**
- Paper: https://arxiv.org/abs/2103.02193
- Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning
# 胶囊网络(Capsule Network)
**Capsule Network is Not More Robust than Convolutional Network**
- Paper: https://arxiv.org/abs/2103.15459
- Code: None
# 图像分类(Image Classification)
**Correlated Input-Dependent Label Noise in Large-Scale Image Classification**
- Paper(Oral): https://arxiv.org/abs/2105.10305
- Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet
# 2D目标检测(Object Detection)
## 2D目标检测
**1. Scaled-YOLOv4: Scaling Cross Stage Partial Network**
- 作者单位: 中央研究院, 英特尔, 静宜大学
- Paper: https://arxiv.org/abs/2011.08036
- Code: https://github.com/WongKinYiu/ScaledYOLOv4
- 中文解读: [YOLOv4官方改进版来了!55.8% AP!速度最高达1774 FPS,Scaled-YOLOv4正式开源!](https://mp.weixin.qq.com/s/AcrJPNoAVhn8cGBUGK7ekA)
**2. You Only Look One-level Feature**
- 作者单位: 中科院, 国科大, 旷视科技
- Paper: https://arxiv.org/abs/2103.09460
- Code: https://github.com/megvii-model/YOLOF
- 中文解读: [CVPR 2021 | 没有FPN!中科院&旷视提出YOLOF:你只需看一层特征](https://mp.weixin.qq.com/s/EJqAG1gTVaP2icI6QL742A)
**3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals**
- 作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校
- Paper: https://arxiv.org/abs/2011.12450
- Code: https://github.com/PeizeSun/SparseR-CNN
- 中文解读: [目标检测新范式!港大同济伯克利提出Sparse R-CNN,代码刚刚开源!](https://mp.weixin.qq.com/s/P2Zgh1wTqf8L2976El5nfQ)
**4. End-to-End Object Detection with Fully Convolutional Network**
- 作者单位: 旷视科技, 西安交通大学
- Paper: https://arxiv.org/abs/2012.03544
- Code: https://github.com/Megvii-BaseDetection/DeFCN
**5. Dynamic Head: Unifying Object Detection Heads with Attentions**
- 作者单位: 微软
- Paper: https://arxiv.org/abs/2106.08322
- Code: https://github.com/microsoft/DynamicHead
- 中文解读: [60.6 AP!打破COCO记录!微软提出DyHead:将注意力与目标检测Heads统一](https://mp.weixin.qq.com/s/uYPUqVXwNau71VAYW3bYIA)
**6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection**
- 作者单位: 南京理工大学, Momenta, 南京大学, 清华大学
- Paper: https://arxiv.org/abs/2011.12885
- Code: https://github.com/implus/GFocalV2
- 中文解读:[CVPR 2021 | GFLV2:目标检测良心技术,无Cost涨点!](https://mp.weixin.qq.com/s/JB7k3NwXU-cDueg6w9mghQ)
**7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers**
- 作者单位: 华南理工大学, 腾讯微信AI
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
- 中文解读: [CVPR 2021 Oral | Transformer再发力!华南理工和微信提出UP-DETR:无监督预训练检测器](https://mp.weixin.qq.com/s/Hprp7B16SGFhVEKXfKiRBQ)
**8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators**
- 作者单位: 威斯康星大学, 谷歌
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf
- Code: https://github.com/tensorflow/models/tree/master/research/object_detection
**9. Tracking Pedestrian Heads in Dense Crowd**
- 作者单位: 雷恩第一大学
- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html
- Code1: https://github.com/Sentient07/HeadHunter
- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T
- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
**10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation**
- 作者单位: 香港科技大学, 华为诺亚
- Paper: https://arxiv.org/abs/2105.12971
- Code: None
**11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery**
- 作者单位: A*star, 四川大学, 南洋理工大学
- Paper: https://arxiv.org/abs/2105.12990
- Code: None
**12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection**
- 作者单位: 旷视科技
- Paper: https://arxiv.org/abs/2104.06936
- Code: None
**13. Multi-Scale Aligned Distillation for Low-Resolution Detection**
- 作者单位: 香港中文大学, Adobe研究院, 思谋科技
- Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
- Code: https://github.com/Jia-Research-Lab/MSAD
**14. Adaptive Class Suppression Loss for Long-Tail Object Detection**
- 作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise
- Paper: https://arxiv.org/abs/2104.00885
- Code: https://github.com/CASIA-IVA-Lab/ACSL
**15. VarifocalNet: An IoU-aware Dense Object Detector**
- 作者单位: 昆士兰科技大学, 昆士兰大学
- Paper(Oral): https://arxiv.org/abs/2008.13367
- Code: https://github.com/hyz-xmaster/VarifocalNet
**16. OTA: Optimal Transport Assignment for Object Detection**
- 作者单位: 早稻田大学, 旷视科技
- Paper: https://arxiv.org/abs/2103.14259
- Code: https://github.com/Megvii-BaseDetection/OTA
**17. Distilling Object Detectors via Decoupled Features**
- 作者单位: 华为诺亚, 悉尼大学
- Paper: https://arxiv.org/abs/2103.14475
- Code: https://github.com/ggjy/DeFeat.pytorch
**18. Robust and Accurate Object Detection via Adversarial Learning**
- 作者单位: 谷歌, UCLA, UCSC
- Paper: https://arxiv.org/abs/2103.13886
- Code: None
**19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection**
- 作者单位: 北京大学, Anyvision, 石溪大学
- Paper: https://arxiv.org/abs/2103.04507
- Code: https://github.com/VDIGPKU/OPANAS
**20. Multiple Instance Active Learning for Object Detection**
- 作者单位: 国科大, 华为诺亚, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf
- Code: https://github.com/yuantn/MI-AOD
**21. Towards Open World Object Detection**
- 作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学
- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD
**22. RankDetNet: Delving Into Ranking Constraints for Object Detection**
- 作者单位: 赛灵思
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_RankDetNet_Delving_Into_Ranking_Constraints_for_Object_Detection_CVPR_2021_paper.html
- Code: None
## 旋转目标检测
**23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection**
- 作者单位: 上海交通大学, 国科大
- Paper: https://arxiv.org/abs/2011.09670
- Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow
- Code2: https://github.com/yangxue0827/RotationDetection
**24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection**
- 作者单位: 武汉大学
- Paper: https://arxiv.org/abs/2103.07733
- Code: https://github.com/csuhan/ReDet
**25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection**
- 作者单位: 国科大, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Beyond_Bounding-Box_Convex-Hull_Feature_Adaptation_for_Oriented_and_Densely_Packed_CVPR_2021_paper.html
- Code: https://github.com/SDL-GuoZonghao/BeyondBoundingBox
## Few-Shot目标检测
**26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss**
- 作者单位: 复旦大学, 同济大学, 浙江大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html
- Code: None
**27. Adaptive Image Transformer for One-Shot Object Detection**
- 作者单位: 中央研究院, 台湾AI Labs
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Adaptive_Image_Transformer_for_One-Shot_Object_Detection_CVPR_2021_paper.html
- Code: None
**28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection**
- 作者单位: 北京大学, 北邮
- Paper: https://arxiv.org/abs/2103.17115
- Code: https://github.com/hzhupku/DCNet
**29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection**
- 作者单位: 卡内基梅隆大学(CMU)
- Paper: https://arxiv.org/abs/2103.01903
- Code: None
**30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding**
- 作者单位: 南加利福尼亚大学, 旷视科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sun_FSCE_Few-Shot_Object_Detection_via_Contrastive_Proposal_Encoding_CVPR_2021_paper.html
- Code: https://github.com/MegviiDetection/FSCE
**31. Hallucination Improves Few-Shot Object Detection**
- 作者单位: 伊利诺伊大学厄巴纳-香槟分校
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Hallucination_Improves_Few-Shot_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/pppplin/HallucFsDet
**32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment**
- 作者单位: 新加坡国立大学, SIMTech
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Few-Shot_Object_Detection_via_Classification_Refinement_and_Distractor_Retreatment_CVPR_2021_paper.html
- Code: None
**33. Generalized Few-Shot Object Detection Without Forgetting**
- 作者单位: 旷视科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Fan_Generalized_Few-Shot_Object_Detection_Without_Forgetting_CVPR_2021_paper.html
- Code: None
**34. Transformation Invariant Few-Shot Object Detection**
- 作者单位: 华为诺亚方舟实验室
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html
- Code: None
**35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation**
- 作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Khandelwal_UniT_Unified_Knowledge_Transfer_for_Any-Shot_Object_Detection_and_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/ubc-vision/UniT
**36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection**
- 作者单位: 国科大, 厦门大学, 鹏城实验室
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Beyond_Max-Margin_Class_Margin_Equilibrium_for_Few-Shot_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/Bohao-Lee/CME
## 半监督目标检测
**37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]**
- 作者单位: 旷视科技, 复旦大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Points_As_Queries_Weakly_Semi-Supervised_Object_Detection_by_Points_CVPR_2021_paper.html
- Code: None
**38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection**
- 作者单位: 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Data-Uncertainty_Guided_Multi-Phase_Learning_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html
- Code: None
**39. Positive-Unlabeled Data Purification in the Wild for Object Detection**
- 作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html
- Code: None
**40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection**
- 作者单位: 阿里巴巴, 香港理工大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Interactive_Self-Training_With_Mean_Teachers_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html
- Code: None
**41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework**
- 作者单位: 阿里巴巴
- Paper: https://arxiv.org/abs/2103.11402
- Code: None
**42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection**
- 作者单位: 卡内基梅隆大学(CMU), 亚马逊
- Homepage: https://yihet.com/humble-teacher
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tang_Humble_Teachers_Teach_Better_Students_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/lryta/HumbleTeacher
**43. Interpolation-Based Semi-Supervised Learning for Object Detection**
- 作者单位: 首尔大学, 阿尔托大学等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Jeong_Interpolation-Based_Semi-Supervised_Learning_for_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/soo89/ISD-SSD
# 域自适应目标检测
**44. Domain-Specific Suppression for Adaptive Object Detection**
- 作者单位: 中科院, 寒武纪, 国科大
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Domain-Specific_Suppression_for_Adaptive_Object_Detection_CVPR_2021_paper.html
- Code: None
**45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection**
- 作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰
- Paper: https://arxiv.org/abs/2103.04224
- Code: None
**46. Unbiased Mean Teacher for Cross-Domain Object Detection**
- 作者单位: 电子科技大学, ETH Zurich
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Deng_Unbiased_Mean_Teacher_for_Cross-Domain_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/kinredon/umt
**47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors**
- 作者单位: 香港大学, 厦门大学, Deepwise AI Lab
- Paper: https://arxiv.org/abs/2103.13757
- Code: None
## 自监督目标检测
**48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge**
- 作者单位: 弗莱堡大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Valverde_There_Is_More_Than_Meets_the_Eye_Self-Supervised_Multi-Object_Detection_CVPR_2021_paper.html
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
**49. Instance Localization for Self-supervised Detection Pretraining**
- 作者单位: 香港中文大学, 微软亚洲研究院
- Paper: https://arxiv.org/abs/2102.08318
- Code: https://github.com/limbo0000/InstanceLoc
## 弱监督目标检测
**50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection**
- 作者单位: 北航, 鹏城实验室, 商汤科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Informative_and_Consistent_Correspondence_Mining_for_Cross-Domain_Weakly_Supervised_Object_CVPR_2021_paper.html
- Code: None
**51. DAP: Detection-Aware Pre-training with Weak Supervision**
- 作者单位: UIUC, 微软
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhong_DAP_Detection-Aware_Pre-Training_With_Weak_Supervision_CVPR_2021_paper.html
- Code: None
## 其他
**52. Open-Vocabulary Object Detection Using Captions**
- 作者单位:Snap, 哥伦比亚大学
- Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html
- Code: https://github.com/alirezazareian/ovr-cnn
**53. Depth From Camera Motion and Object Detection**
- 作者单位: 密歇根大学, SIAI
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
**54. Unsupervised Object Detection With LIDAR Clues**
- 作者单位: 商汤科技, 国科大, 中科大
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tian_Unsupervised_Object_Detection_With_LIDAR_Clues_CVPR_2021_paper.html
- Code: None
**55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs**
- 作者单位: 国科大, 北理, 中科院, 商汤科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Bu_GAIA_A_Transfer_Learning_System_of_Object_Detection_That_Fits_CVPR_2021_paper.html
- Code: https://github.com/GAIA-vision/GAIA-det
**56. General Instance Distillation for Object Detection**
- 作者单位: 旷视科技, 北航
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dai_General_Instance_Distillation_for_Object_Detection_CVPR_2021_paper.html
- Code: None
**57. AQD: Towards Accurate Quantized Object Detection**
- 作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_AQD_Towards_Accurate_Quantized_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/aim-uofa/model-quantization
**58. Scale-Aware Automatic Augmentation for Object Detection**
- 作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scale-Aware_Automatic_Augmentation_for_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/Jia-Research-Lab/SA-AutoAug
**59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection**
- 作者单位: 同济大学, 商汤科技, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tan_Equalization_Loss_v2_A_New_Gradient_Balance_Approach_for_Long-Tailed_CVPR_2021_paper.html
- Code: https://github.com/tztztztztz/eqlv2
**60. Class-Aware Robust Adversarial Training for Object Detection**
- 作者单位: 哥伦比亚大学, 中央研究院
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Class-Aware_Robust_Adversarial_Training_for_Object_Detection_CVPR_2021_paper.html
- Code: None
**61. Improved Handling of Motion Blur in Online Object Detection**
- 作者单位: 伦敦大学学院
- Homepage: http://visual.cs.ucl.ac.uk/pubs/handlingMotionBlur/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sayed_Improved_Handling_of_Motion_Blur_in_Online_Object_Detection_CVPR_2021_paper.html
- Code: None
**62. Multiple Instance Active Learning for Object Detection**
- 作者单位: 国科大, 华为诺亚
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.html
- Code: https://github.com/yuantn/MI-AOD
**63. Neural Auto-Exposure for High-Dynamic Range Object Detection**
- 作者单位: Algolux, 普林斯顿大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html
- Code: None
**64. Generalizable Pedestrian Detection: The Elephant in the Room**
- 作者单位: IIAI, 阿尔托大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hasan_Generalizable_Pedestrian_Detection_The_Elephant_in_the_Room_CVPR_2021_paper.html
- Code: https://github.com/hasanirtiza/Pedestron
**65. Neural Auto-Exposure for High-Dynamic Range Object Detection**
- 作者单位: Algolux, 普林斯顿大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html
- Code: None
# 单/多目标跟踪(Object Tracking)
## 单目标跟踪
**LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search**
- Paper: https://arxiv.org/abs/2104.14545
- Code: https://github.com/researchmm/LightTrack
**Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark**
- Homepage: https://sites.google.com/view/langtrackbenchmark/
- Paper: https://arxiv.org/abs/2103.16746
- Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
- Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang
**IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking**
- Paper: https://arxiv.org/abs/2103.14938
- Code: https://github.com/VISION-SJTU/IoUattack
**Graph Attention Tracking**
- Paper: https://arxiv.org/abs/2011.11204
- Code: https://github.com/ohhhyeahhh/SiamGAT
**Rotation Equivariant Siamese Networks for Tracking**
- Paper: https://arxiv.org/abs/2012.13078
- Code: None
**Track to Detect and Segment: An Online Multi-Object Tracker**
- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: None
- Code: None
**Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking**
- Paper(Oral): https://arxiv.org/abs/2103.11681
- Code: https://github.com/594422814/TransformerTrack
**Transformer Tracking**
- Paper: https://arxiv.org/abs/2103.15436
- Code: https://github.com/chenxin-dlut/TransT
## 多目标跟踪
**Tracking Pedestrian Heads in Dense Crowd**
- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html
- Code1: https://github.com/Sentient07/HeadHunter
- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T
- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
**Multiple Object Tracking with Correlation Learning**
- Paper: https://arxiv.org/abs/2104.03541
- Code: None
**Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking**
- Paper: https://arxiv.org/abs/2012.02337
- Code: None
**Learning a Proposal Classifier for Multiple Object Tracking**
- Paper: https://arxiv.org/abs/2103.07889
- Code: https://github.com/daip13/LPC_MOT.git
**Track to Detect and Segment: An Online Multi-Object Tracker**
- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: https://arxiv.org/abs/2103.08808
- Code: https://github.com/JialianW/TraDeS
# 语义分割(Semantic Segmentation)
**1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation**
- 作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学
- Homepage: https://nirkin.com/hyperseg/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf
- Code: https://github.com/YuvalNirkin/hyperseg
**2. Rethinking BiSeNet For Real-time Semantic Segmentation**
- 作者单位: 美团
- Paper: https://arxiv.org/abs/2104.13188
- Code: https://github.com/MichaelFan01/STDC-Seg
**3. Progressive Semantic Segmentation**
- 作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学
- Paper: https://arxiv.org/abs/2104.03778
- Code: https://github.com/VinAIResearch/MagNet
**4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers**
- 作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI
- Homepage: https://fudan-zvg.github.io/SETR
- Paper: https://arxiv.org/abs/2012.15840
- Code: https://github.com/fudan-zvg/SETR
**5. Capturing Omni-Range Context for Omnidirectional Segmentation**
- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为
- Paper: https://arxiv.org/abs/2103.05687
- Code: None
**6. Learning Statistical Texture for Semantic Segmentation**
- 作者单位: 北航, 商汤科技
- Paper: https://arxiv.org/abs/2103.04133
- Code: None
**7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation**
- 作者单位: 高通AI研究院
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Borse_InverseForm_A_Loss_Function_for_Structured_Boundary-Aware_Segmentation_CVPR_2021_paper.html
- Code: None
**8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation**
- 作者单位: Joyy Inc, 快手, 北航等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_DCNAS_Densely_Connected_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.html
- Code: None
## 弱监督语义分割
**9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation**
- 作者单位: 延世大学, 成均馆大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lee_Railroad_Is_Not_a_Train_Saliency_As_Pseudo-Pixel_Supervision_for_CVPR_2021_paper.html
- Code: https://github.com/halbielee/EPS
**10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation**
- 作者单位: 延世大学
- Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/
- Paper: https://arxiv.org/abs/2104.00905
- Code: None
**11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation**
- 作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学
- Paper: https://arxiv.org/abs/2103.14581
- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom
**12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation**
- 作者单位: 北京理工大学, 美团
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_Embedded_Discriminative_Attention_Mechanism_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/allenwu97/EDAM
**13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation**
- 作者单位: 首尔大学
- Paper: https://arxiv.org/abs/2103.08907
- Code: https://github.com/jbeomlee93/BBAM
## 半监督语义分割
**14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision**
- 作者单位: 北京大学, 微软亚洲研究院
- Paper: https://arxiv.org/abs/2106.01226
- Code: https://github.com/charlesCXK/TorchSemiSeg
**15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation**
- 作者单位: 华为, 大连理工大学, 北京大学
- Paper: https://arxiv.org/abs/2103.04705
- Code: None
**16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency**
- 作者单位: 香港中文大学, 思谋科技, 牛津大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lai_Semi-Supervised_Semantic_Segmentation_With_Directional_Context-Aware_Consistency_CVPR_2021_paper.html
- Code: None
**17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization**
- 作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Semantic_Segmentation_With_Generative_Models_Semi-Supervised_Learning_and_Strong_Out-of-Domain_CVPR_2021_paper.html
- Code: https://nv-tlabs.github.io/semanticGAN/
**18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation**
- 作者单位: ETH Zurich, 伯恩大学, 鲁汶大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hoyer_Three_Ways_To_Improve_Semantic_Segmentation_With_Self-Supervised_Depth_Estimation_CVPR_2021_paper.html
- Code: https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth
## 域自适应语义分割
**19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation**
- 作者单位: ETH Zurich, 鲁汶大学, 电子科技大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html
- Code: None
**20. Source-Free Domain Adaptation for Semantic Segmentation**
- 作者单位: 华东师范大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Source-Free_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2021_paper.html
- Code: None
**21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation**
- 作者单位: Idiap Research Institute, EPFL, 日内瓦大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/S_Uncertainty_Reduction_for_Model_Adaptation_in_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://git.io/JthPp
**22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation**
- 作者单位: 达姆施塔特工业大学, hessian.AI
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Araslanov_Self-Supervised_Augmentation_Consistency_for_Adapting_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/visinf/da-sac
**23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening**
- 作者单位: LG AI研究院, KAIST等
- Paper: https://arxiv.org/abs/2103.15597
- Code: https://github.com/shachoi/RobustNet
**24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization**
- 作者单位: 香港大学, 深睿医疗
- Paper: https://arxiv.org/abs/2103.13041
- Code: None
**25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation**
- 作者单位: 香港城市大学, 百度
- Paper: https://arxiv.org/abs/2103.05254
- Code: https://github.com/cyang-cityu/MetaCorrection
**26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation**
- 作者单位: 华为云, 华为诺亚, 大连理工大学
- Paper: https://arxiv.org/abs/2103.04717
- Code: None
**27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation**
- 作者单位: 中国科学技术大学, 微软亚洲研究院
- Paper: https://arxiv.org/abs/2101.10979
- Code: https://github.com/microsoft/ProDA
**28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation**
- 作者单位: 南卡罗来纳大学, 天远视科技
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_DANNet_A_One-Stage_Domain_Adaptation_Network_for_Unsupervised_Nighttime_Semantic_CVPR_2021_paper.html
- Code: https://github.com/W-zx-Y/DANNet
## Few-Shot语义分割
**29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation**
- 作者单位: MBZUAI, IIAI, 哈工大
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Xie_Scale-Aware_Graph_Neural_Network_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html
- Code: None
**30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation**
- 作者单位: 国科大, 清华大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Anti-Aliasing_Semantic_Reconstruction_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/Bibkiller/ASR
## 无监督语义分割
**31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering**
- 作者单位: UT-Austin, 康奈尔大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cho_PiCIE_Unsupervised_Semantic_Segmentation_Using_Invariance_and_Equivariance_in_Clustering_CVPR_2021_paper.html
- Code: https:// github.com/janghyuncho/PiCIE
## 视频语义分割
**32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild**
- 作者单位: 浙江大学, 百度, 悉尼科技大学
- Homepage: https://www.vspwdataset.com/
- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
- GitHub: https://github.com/sssdddwww2/vspw_dataset_download
## 其它
**33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations**
- 作者单位: 帕多瓦大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html
- Code: https://lttm.dei.unipd.it/paper_data/SDR/
**34. Exploit Visual Dependency Relations for Semantic Segmentation**
- 作者单位: 伊利诺伊大学芝加哥分校
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Exploit_Visual_Dependency_Relations_for_Semantic_Segmentation_CVPR_2021_paper.html
- Code: None
**35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs**
- 作者单位: Institute for Infocomm Research, 新加坡国立大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.html
- Code: None
**36. PLOP: Learning without Forgetting for Continual Semantic Segmentation**
- 作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai
- Paper: https://arxiv.org/abs/2011.11390
- Code: https://github.com/arthurdouillard/CVPR2021_PLOP
**37. 3D-to-2D Distillation for Indoor Scene Parsing**
- 作者单位: 香港中文大学, 香港大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_3D-to-2D_Distillation_for_Indoor_Scene_Parsing_CVPR_2021_paper.html
- Code: None
**38. Bidirectional Projection Network for Cross Dimension Scene Understanding**
- 作者单位: 香港中文大学, 牛津大学等
- Paper(Oral): https://arxiv.org/abs/2103.14326
- Code: https://github.com/wbhu/BPNet
**39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation**
- 作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html
- Code: https://github.com/lxtGH/PFSegNets
# 实例分割(Instance Segmentation)
**DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation**
- Paper: https://arxiv.org/abs/2011.09876
- Code: https://github.com/aliyun/DCT-Mask
**Incremental Few-Shot Instance Segmentation**
- Paper: https://arxiv.org/abs/2105.05312
- Code: https://github.com/danganea/iMTFA
**A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation**
- Paper: https://arxiv.org/abs/2105.03186
- Code: None
**RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features**
- Paper: https://arxiv.org/abs/2104.08569
- Code: https://github.com/zhanggang001/RefineMask/
**Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation**
- Paper: https://arxiv.org/abs/2104.05239
- Code: https://github.com/tinyalpha/BPR
**Multi-Scale Aligned Distillation for Low-Resolution Detection**
- Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
- Code: https://github.com/Jia-Research-Lab/MSAD
**Boundary IoU: Improving Object-Centric Image Segmentation Evaluation**
- Homepage: https://bowenc0221.github.io/boundary-iou/
- Paper: https://arxiv.org/abs/2103.16562
- Code: https://github.com/bowenc0221/boundary-iou-api
**Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers**
- Paper: https://arxiv.org/abs/2103.12340
- Code: https://github.com/lkeab/BCNet
**Zero-shot instance segmentation(Not Sure)**
- Paper: None
- Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395
## 视频实例分割
**STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation**
- Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
- Code: https://github.com/MinghanLi/STMask
**End-to-End Video Instance Segmentation with Transformers**
- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR
# 全景分割(Panoptic Segmentation)
**ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2012.05258
- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab
**Part-aware Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2106.06351
- Code: https://github.com/tue-mps/panoptic_parts
- Dataset: https://github.com/tue-mps/panoptic_parts
**Exemplar-Based Open-Set Panoptic Segmentation Network**
- Homepage: https://cv.snu.ac.kr/research/EOPSN/
- Paper: https://arxiv.org/abs/2105.08336
- Code: https://github.com/jd730/EOPSN
**MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers**
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html
- Code: None
**Panoptic Segmentation Forecasting**
- Paper: https://arxiv.org/abs/2104.03962
- Code: https://github.com/nianticlabs/panoptic-forecasting
**Fully Convolutional Networks for Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2012.00720
- Code: https://github.com/yanwei-li/PanopticFCN
**Cross-View Regularization for Domain Adaptive Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2103.02584
- Code: None
# 医学图像分割
**1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling**
- 作者单位: 腾讯天衍实验室, 北京同仁医院
- Paper(Best Paper Candidate): https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Learning_Calibrated_Medical_Image_Segmentation_via_Multi-Rater_Agreement_Modeling_CVPR_2021_paper.html
- Code: https://github.com/jiwei0921/MRNet/
**2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation**
- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Reiss_Every_Annotation_Counts_Multi-Label_Deep_Supervision_for_Medical_Image_Segmentation_CVPR_2021_paper.html
- Code: None
**3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space**
- 作者单位: 香港中文大学, 香港理工大学
- Paper: https://arxiv.org/abs/2103.06030
- Code: https://github.com/liuquande/FedDG-ELCFS
**4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation**
- 作者单位: 约翰斯·霍普金斯大大学, NVIDIA
- Paper(Oral): https://arxiv.org/abs/2103.15954
- Code: None
**5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images**
- 作者单位: 斯坦福大学
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html
- Code: None
# 视频目标分割(Video-Object-Segmentation)
**Learning Position and Target Consistency for Memory-based Video Object Segmentation**
- Paper: https://arxiv.org/abs/2104.04329
- Code: None
**SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation**
- Paper(Oral): https://arxiv.org/abs/2101.08833
- Code: https://github.com/dukebw/SSTVOS
# 交互式视频目标分割(Interactive-Video-Object-Segmentation)
**Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion**
- Homepage: https://hkchengrex.github.io/MiVOS/
- Paper: https://arxiv.org/abs/2103.07941
- Code: https://github.com/hkchengrex/MiVOS
- Demo: https://hkchengrex.github.io/MiVOS/video.html#partb
**Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild**
- Paper: https://arxiv.org/abs/2103.10391
- Code: https://github.com/svip-lab/IVOS-W
# 显著性检测(Saliency Detection)
**Uncertainty-aware Joint Salient Object and Camouflaged Object Detection**
- Paper: https://arxiv.org/abs/2104.02628
- Code: https://github.com/JingZhang617/Joint_COD_SOD
**Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion**
- Paper(Oral): https://arxiv.org/abs/2103.11832
- Code: https://github.com/sunpeng1996/DSA2F
# 伪装物体检测(Camouflaged Object Detection)
**Uncertainty-aware Joint Salient Object and Camouflaged Object Detection**
- Paper: https://arxiv.org/abs/2104.02628
- Code: https://github.com/JingZhang617/Joint_COD_SOD
# 协同显著性检测(Co-Salient Object Detection)
**Group Collaborative Learning for Co-Salient Object Detection**
- Paper: https://arxiv.org/abs/2104.01108
- Code: https://github.com/fanq15/GCoNet
# 协同显著性检测(Image Matting)
**Semantic Image Matting**
- Paper: https://arxiv.org/abs/2104.08201
- Code: https://github.com/nowsyn/SIM
- Dataset: https://github.com/nowsyn/SIM
# 行人重识别(Person Re-identification)
**Generalizable Person Re-identification with Relevance-aware Mixture of Experts**
- Paper: https://arxiv.org/abs/2105.09156
- Code: None
**Unsupervised Multi-Source Domain Adaptation for Person Re-Identification**
- Paper: https://arxiv.org/abs/2104.12961
- Code: None
**Combined Depth Space based Architecture Search For Person Re-identification**
- Paper: https://arxiv.org/abs/2104.04163
- Code: None
# 行人搜索(Person Search)
**Anchor-Free Person Search**
- Paper: https://arxiv.org/abs/2103.11617
- Code: https://github.com/daodaofr/AlignPS
- Interpretation: [首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021](https://mp.weixin.qq.com/s/iqJkgp0JBanmeBPyHUkb-A)
# 视频理解/行为识别(Video Understanding)
**Temporal-Relational CrossTransformers for Few-Shot Action Recognition**
- Paper: https://arxiv.org/abs/2101.06184
- Code: https://github.com/tobyperrett/trx
**FrameExit: Conditional Early Exiting for Efficient Video Recognition**
- Paper(Oral): https://arxiv.org/abs/2104.13400
- Code: None
**No frame left behind: Full Video Action Recognition**
- Paper: https://arxiv.org/abs/2103.15395
- Code: None
**Learning Salient Boundary Feature for Anchor-free Temporal Action Localization**
- Paper: https://arxiv.org/abs/2103.13137
- Code: None
**Temporal Context Aggregation Network for Temporal Action Proposal Refinement**
- Paper: https://arxiv.org/abs/2103.13141
- Code: None
- Interpretation: [CVPR 2021 | TCANet:最强时序动作提名修正网络](https://mp.weixin.qq.com/s/UOWMfpTljkyZznHtpkQBhA)
**ACTION-Net: Multipath Excitation for Action Recognition**
- Paper: https://arxiv.org/abs/2103.07372
- Code: https://github.com/V-Sense/ACTION-Net
**Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning**
- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE
**TDN: Temporal Difference Networks for Efficient Action Recognition**
- Paper: https://arxiv.org/abs/2012.10071
- Code: https://github.com/MCG-NJU/TDN
# 人脸识别(Face Recognition)
**A 3D GAN for Improved Large-pose Facial Recognition**
- Paper: https://arxiv.org/abs/2012.10545
- Code: None
**MagFace: A Universal Representation for Face Recognition and Quality Assessment**
- Paper(Oral): https://arxiv.org/abs/2103.06627
- Code: https://github.com/IrvingMeng/MagFace
**WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition**
- Homepage: https://www.face-benchmark.org/
- Paper: https://arxiv.org/abs/2103.04098
- Dataset: https://www.face-benchmark.org/
**When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework**
- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace
# 人脸检测(Face Detection)
**HLA-Face: Joint High-Low Adaptation for Low Light Face Detection**
- Homepage: https://daooshee.github.io/HLA-Face-Website/
- Paper: https://arxiv.org/abs/2104.01984
- Code: https://github.com/daooshee/HLA-Face-Code
**CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement**
- Paper: https://arxiv.org/abs/2103.07017
- Code: None
# 人脸活体检测(Face Anti-Spoofing)
**Cross Modal Focal Loss for RGBD Face Anti-Spoofing**
- Paper: https://arxiv.org/abs/2103.00948
- Code: None
# Deepfake检测(Deepfake Detection)
**Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain**
- Paper:https://arxiv.org/abs/2103.01856
- Code: None
**Multi-attentional Deepfake Detection**
- Paper:https://arxiv.org/abs/2103.02406
- Code: None
# 人脸年龄估计(Age Estimation)
**Continuous Face Aging via Self-estimated Residual Age Embedding**
- Paper: https://arxiv.org/abs/2105.00020
- Code: None
**PML: Progressive Margin Loss for Long-tailed Age Classification**
- Paper: https://arxiv.org/abs/2103.02140
- Code: None
# 人脸表情识别(Facial Expression Recognition)
**Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition**
- Paper: https://arxiv.org/abs/2103.13372
- Code: None
# Deepfakes
**MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes**
- Paper: https://arxiv.org/abs/2103.14211
- Code: None
# 人体解析(Human Parsing)
**Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing**
- Paper: https://arxiv.org/abs/2103.04570
- Code: https://github.com/tfzhou/MG-HumanParsing
# 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
## 2D 人体姿态估计
**ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search**
- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None
**When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks**
- Paper: https://arxiv.org/abs/2105.06152
- Code: None
**Pose Recognition with Cascade Transformers**
- Paper: https://arxiv.org/abs/2104.06976
- Code: https://github.com/mlpc-ucsd/PRTR
**DCPose: Deep Dual Consecutive Network for Human Pose Estimation**
- Paper: https://arxiv.org/abs/2103.07254
- Code: https://github.com/Pose-Group/DCPose
## 3D 人体姿态估计
**End-to-End Human Pose and Mesh Reconstruction with Transformers**
- Paper: https://arxiv.org/abs/2012.09760
- Code: https://github.com/microsoft/MeshTransformer
**PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation**
- Paper(Oral): https://arxiv.org/abs/2105.02465
- Code: https://github.com/jfzhang95/PoseAug
**Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration**
- Paper: https://arxiv.org/abs/2103.02845
- Code: https://github.com/SeanChenxy/HandMesh
**Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks**
- Paper: https://arxiv.org/abs/2104.01797
- https://github.com/3dpose/3D-Multi-Person-Pose
**HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation**
- Homepage: https://jeffli.site/HybrIK/
- Paper: https://arxiv.org/abs/2011.14672
- Code: https://github.com/Jeff-sjtu/HybrIK
# 动物姿态估计(Animal Pose Estimation)
**From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation**
- Paper: https://arxiv.org/abs/2103.14843
- Code: None
# 手部姿态估计(Hand Pose Estimation)
**Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time**
- Homepage: https://stevenlsw.github.io/Semi-Hand-Object/
- Paper: https://arxiv.org/abs/2106.05266
- Code: https://github.com/stevenlsw/Semi-Hand-Object
# Human Volumetric Capture
**POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture**
- Homepage: http://www.liuyebin.com/posefusion/posefusion.html
- Paper(Oral): https://arxiv.org/abs/2103.15331
- Code: None
# 场景文本检测(Scene Text Detection)
**Fourier Contour Embedding for Arbitrary-Shaped Text Detection**
- Paper: https://arxiv.org/abs/2104.10442
- Code: None
# 场景文本识别(Scene Text Recognition)
**Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition**
- Paper: https://arxiv.org/abs/2103.06495
- Code: https://github.com/FangShancheng/ABINet
# 图像压缩
**Checkerboard Context Model for Efficient Learned Image Compression**
- Paper: https://arxiv.org/abs/2103.15306
- Code: None
**Slimmable Compressive Autoencoders for Practical Neural Image Compression**
- Paper: https://arxiv.org/abs/2103.15726
- Code: None
**Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton**
- Paper: https://arxiv.org/abs/2103.15368
- Code: None
# 模型压缩/剪枝/量化
**Teachers Do More Than Teach: Compressing Image-to-Image Models**
- Paper: https://arxiv.org/abs/2103.03467
- Code: https://github.com/snap-research/CAT
## 模型剪枝
**Dynamic Slimmable Network**
- Paper: https://arxiv.org/abs/2103.13258
- Code: https://github.com/changlin31/DS-Net
## 模型量化
**Network Quantization with Element-wise Gradient Scaling**
- Paper: https://arxiv.org/abs/2104.00903
- Code: None
**Zero-shot Adversarial Quantization**
- Paper(Oral): https://arxiv.org/abs/2103.15263
- Code: https://git.io/Jqc0y
**Learnable Companding Quantization for Accurate Low-bit Neural Networks**
- Paper: https://arxiv.org/abs/2103.07156
- Code: None
# 知识蒸馏(Knowledge Distillation)
**Distilling Knowledge via Knowledge Review**
- Paper: https://arxiv.org/abs/2104.09044
- Code: https://github.com/Jia-Research-Lab/ReviewKD
**Distilling Object Detectors via Decoupled Features**
- Paper: https://arxiv.org/abs/2103.14475
- Code: https://github.com/ggjy/DeFeat.pytorch
# 超分辨率(Super-Resolution)
**Image Super-Resolution with Non-Local Sparse Attention**
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Mei_Image_Super-Resolution_With_Non-Local_Sparse_Attention_CVPR_2021_paper.pdf
- Code: https://github.com/HarukiYqM/Non-Local-Sparse-Attention
**Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline**
- Homepage: http://mepro.bjtu.edu.cn/resource.html
- Paper: https://arxiv.org/abs/2104.06174
- Code: None
**ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic**
- Paper: https://arxiv.org/abs/2103.04039
- Code: https://github.com/Xiangtaokong/ClassSR
**AdderSR: Towards Energy Efficient Image Super-Resolution**
- Paper: https://arxiv.org/abs/2009.08891
- Code: None
# 去雾(Dehazing)
**Contrastive Learning for Compact Single Image Dehazing**
- Paper: https://arxiv.org/abs/2104.09367
- Code: https://github.com/GlassyWu/AECR-Net
## 视频超分辨率
**Temporal Modulation Network for Controllable Space-Time Video Super-Resolution**
- Paper: None
- Code: https://github.com/CS-GangXu/TMNet
# 图像恢复(Image Restoration)
**Multi-Stage Progressive Image Restoration**
- Paper: https://arxiv.org/abs/2102.02808
- Code: https://github.com/swz30/MPRNet
# 图像补全(Image Inpainting)
**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**
- Paper: https://arxiv.org/abs/2105.02201
- Code: https://github.com/KumapowerLIU/PD-GAN
**TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations**
- Homepage: https://yzhouas.github.io/projects/TransFill/index.html
- Paper: https://arxiv.org/abs/2103.15982
- Code: None
# 图像编辑(Image Editing)
**StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**
- Paper: https://arxiv.org/abs/2104.14754
- Code: https://github.com/naver-ai/StyleMapGAN
- Demo Video: https://youtu.be/qCapNyRA_Ng
**High-Fidelity and Arbitrary Face Editing**
- Paper: https://arxiv.org/abs/2103.15814
- Code: None
**Anycost GANs for Interactive Image Synthesis and Editing**
- Paper: https://arxiv.org/abs/2103.03243
- Code: https://github.com/mit-han-lab/anycost-gan
**PISE: Person Image Synthesis and Editing with Decoupled GAN**
- Paper: https://arxiv.org/abs/2103.04023
- Code: https://github.com/Zhangjinso/PISE
**DeFLOCNet: Deep Image Editing via Flexible Low-level Controls**
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
**Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**
- Paper: None
- Code: None
# 图像描述(Image Captioning)
**Towards Accurate Text-based Image Captioning with Content Diversity Exploration**
- Paper: https://arxiv.org/abs/2105.03236
- Code: None
# 字体生成(Font Generation)
**DG-Font: Deformable Generative Networks for Unsupervised Font Generation**
- Paper: https://arxiv.org/abs/2104.03064
- Code: https://github.com/ecnuycxie/DG-Font
# 图像匹配(Image Matcing)
**LoFTR: Detector-Free Local Feature Matching with Transformers**
- Homepage: https://zju3dv.github.io/loftr/
- Paper: https://arxiv.org/abs/2104.00680
- Code: https://github.com/zju3dv/LoFTR
**Convolutional Hough Matching Networks**
- Homapage: http://cvlab.postech.ac.kr/research/CHM/
- Paper(Oral): https://arxiv.org/abs/2103.16831
- Code: None
# 图像融合(Image Blending)
**Bridging the Visual Gap: Wide-Range Image Blending**
- Paper: https://arxiv.org/abs/2103.15149
- Code: https://github.com/julia0607/Wide-Range-Image-Blending
# 反光去除(Reflection Removal)
**Robust Reflection Removal with Reflection-free Flash-only Cues**
- Paper: https://arxiv.org/abs/2103.04273
- Code: https://github.com/ChenyangLEI/flash-reflection-removal
# 3D点云分类(3D Point Clouds Classification)
**Equivariant Point Network for 3D Point Cloud Analysis**
- Paper: https://arxiv.org/abs/2103.14147
- Code: None
**PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds**
- Paper: https://arxiv.org/abs/2103.14635
- Code: https://github.com/CVMI-Lab/PAConv
# 3D目标检测(3D Object Detection)
**3D-MAN: 3D Multi-frame Attention Network for Object Detection**
- Paper: https://arxiv.org/abs/2103.16054
- Code: None
**Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds**
- Paper: https://arxiv.org/abs/2104.06114
- Code: https://github.com/cheng052/BRNet
**HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection**
- Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/
- Paper: https://arxiv.org/abs/2104.00902
- Code: https://github.com/cvlab-yonsei/HVPR
**LiDAR R-CNN: An Efficient and Universal 3D Object Detector**
- Paper: https://arxiv.org/abs/2103.15297
- Code: https://github.com/tusimple/LiDAR_RCNN
**M3DSSD: Monocular 3D Single Stage Object Detector**
- Paper: https://arxiv.org/abs/2103.13164
- Code: https://github.com/mumianyuxin/M3DSSD
**SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud**
- Paper: None
- Code: https://github.com/Vegeta2020/SE-SSD
**Center-based 3D Object Detection and Tracking**
- Paper: https://arxiv.org/abs/2006.11275
- Code: https://github.com/tianweiy/CenterPoint
**Categorical Depth Distribution Network for Monocular 3D Object Detection**
- Paper: https://arxiv.org/abs/2103.01100
- Code: None
# 3D语义分割(3D Semantic Segmentation)
**Bidirectional Projection Network for Cross Dimension Scene Understanding**
- Paper(Oral): https://arxiv.org/abs/2103.14326
- Code: https://github.com/wbhu/BPNet
**Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion**
- Paper: https://arxiv.org/abs/2103.07074
- Code: https://github.com/ShiQiu0419/BAAF-Net
**Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation**
- Paper: https://arxiv.org/abs/2011.10033
- Code: https://github.com/xinge008/Cylinder3D
**Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges**
- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban
# 3D全景分割(3D Panoptic Segmentation)
**Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2103.14962
- Code: https://github.com/edwardzhou130/Panoptic-PolarNet
# 3D目标跟踪(3D Object Trancking)
**Center-based 3D Object Detection and Tracking**
- Paper: https://arxiv.org/abs/2006.11275
- Code: https://github.com/tianweiy/CenterPoint
# 3D点云配准(3D Point Cloud Registration)
**ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning**
- Paper: https://arxiv.org/abs/2103.15231
- Code: None
**PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency**
- Paper: https://arxiv.org/abs/2103.05465
- Code: https://github.com/XuyangBai/PointDSC
**PREDATOR: Registration of 3D Point Clouds with Low Overlap**
- Paper: https://arxiv.org/abs/2011.13005
- Code: https://github.com/ShengyuH/OverlapPredator
# 3D点云补全(3D Point Cloud Completion)
**Unsupervised 3D Shape Completion through GAN Inversion**
- Homepage: https://junzhezhang.github.io/projects/ShapeInversion/
- Paper: https://arxiv.org/abs/2104.13366
- Code: https://github.com/junzhezhang/shape-inversion
**Variational Relational Point Completion Network**
- Homepage: https://paul007pl.github.io/projects/VRCNet
- Paper: https://arxiv.org/abs/2104.10154
- Code: https://github.com/paul007pl/VRCNet
**Style-based Point Generator with Adversarial Rendering for Point Cloud Completion**
- Homepage: https://alphapav.github.io/SpareNet/
- Paper: https://arxiv.org/abs/2103.02535
- Code: https://github.com/microsoft/SpareNet
# 3D重建(3D Reconstruction)
**Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection**
- Paper: http://arxiv.org/abs/2106.07852
- Code: https://github.com/TencentYoutuResearch/3DFaceReconstruction-LAP
**Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction**
- Paper: https://arxiv.org/abs/2104.00858
- Code: None
**NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video**
- Homepage: https://zju3dv.github.io/neuralrecon/
- Paper(Oral): https://arxiv.org/abs/2104.00681
- Code: https://github.com/zju3dv/NeuralRecon
# 6D位姿估计(6D Pose Estimation)
**FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism**
- Paper(Oral): https://arxiv.org/abs/2103.07054
- Code: https://github.com/DC1991/FS-Net
**GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation**
- Paper: http://arxiv.org/abs/2102.12145
- code: https://git.io/GDR-Net
**FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation**
- Paper: https://arxiv.org/abs/2103.02242
- Code: https://github.com/ethnhe/FFB6D
# 相机姿态估计
**Back to the Feature: Learning Robust Camera Localization from Pixels to Pose**
- Paper: https://arxiv.org/abs/2103.09213
- Code: https://github.com/cvg/pixloc
# 深度估计(Depth Estimation)
**S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation**
- Paper(Oral): https://arxiv.org/abs/2104.00877
- Code: None
**Beyond Image to Depth: Improving Depth Prediction using Echoes**
- Homepage: https://krantiparida.github.io/projects/bimgdepth.html
- Paper: https://arxiv.org/abs/2103.08468
- Code: https://github.com/krantiparida/beyond-image-to-depth
**S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation**
- Paper: https://arxiv.org/abs/2103.02396
- Code: None
**Depth from Camera Motion and Object Detection**
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
# 立体匹配(Stereo Matching)
**A Decomposition Model for Stereo Matching**
- Paper: https://arxiv.org/abs/2104.07516
- Code: None
# 光流估计(Flow Estimation)
**Self-Supervised Multi-Frame Monocular Scene Flow**
- Paper: https://arxiv.org/abs/2105.02216
- Code: https://github.com/visinf/multi-mono-sf
**RAFT-3D: Scene Flow using Rigid-Motion Embeddings**
- Paper: https://arxiv.org/abs/2012.00726v1
- Code: None
**Learning Optical Flow From Still Images**
- Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/
- Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf
- Code: https://github.com/mattpoggi/depthstillation
**FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds**
- Paper: https://arxiv.org/abs/2104.00798
- Code: None
# 车道线检测(Lane Detection)
**Focus on Local: Detecting Lane Marker from Bottom Up via Key Point**
- Paper: https://arxiv.org/abs/2105.13680
- Code: None
**Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection**
- Paper: https://arxiv.org/abs/2010.12035
- Code: https://github.com/lucastabelini/LaneATT
# 轨迹预测(Trajectory Prediction)
**Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction**
- Paper(Oral): https://arxiv.org/abs/2104.08277
- Code: None
# 人群计数(Crowd Counting)
**Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark**
- Paper: https://arxiv.org/abs/2105.02440
- Code: https://github.com/VisDrone/DroneCrowd
- Dataset: https://github.com/VisDrone/DroneCrowd
# 对抗样本(Adversarial Examples)
**Enhancing the Transferability of Adversarial Attacks through Variance Tuning**
- Paper: https://arxiv.org/abs/2103.15571
- Code: https://github.com/JHL-HUST/VT
**LiBRe: A Practical Bayesian Approach to Adversarial Detection**
- Paper: https://arxiv.org/abs/2103.14835
- Code: None
**Natural Adversarial Examples**
- Paper: https://arxiv.org/abs/1907.07174
- Code: https://github.com/hendrycks/natural-adv-examples
# 图像检索(Image Retrieval)
**StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval**
- Paper: https://arxiv.org/abs/2103.15706
- COde: None
**QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval**
- Paper: https://arxiv.org/abs/2103.02927
- Code: None
# 视频检索(Video Retrieval)
**On Semantic Similarity in Video Retrieval**
- Paper: https://arxiv.org/abs/2103.10095
- Homepage: https://mwray.github.io/SSVR/
- Code: https://github.com/mwray/Semantic-Video-Retrieval
# 跨模态检索(Cross-modal Retrieval)
**Cross-Modal Center Loss for 3D Cross-Modal Retrieval**
- Paper: https://arxiv.org/abs/2008.03561
- Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss
**Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers**
- Paper: https://arxiv.org/abs/2103.16553
- Code: None
**Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning**
- Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
- Code: https://github.com/amzn/image-to-recipe-transformers
# Zero-Shot Learning
**Counterfactual Zero-Shot and Open-Set Visual Recognition**
- Paper: https://arxiv.org/abs/2103.00887
- Code: https://github.com/yue-zhongqi/gcm-cf
# 联邦学习(Federated Learning)
**FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space**
- Paper: https://arxiv.org/abs/2103.06030
- Code: https://github.com/liuquande/FedDG-ELCFS
# 视频插帧(Video Frame Interpolation)
**CDFI: Compression-Driven Network Design for Frame Interpolation**
- Paper: None
- Code: https://github.com/tding1/CDFI
**FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation**
- Homepage: https://tarun005.github.io/FLAVR/
- Paper: https://arxiv.org/abs/2012.08512
- Code: https://github.com/tarun005/FLAVR
# 视觉推理(Visual Reasoning)
**Transformation Driven Visual Reasoning**
- homepage: https://hongxin2019.github.io/TVR/
- Paper: https://arxiv.org/abs/2011.13160
- Code: https://github.com/hughplay/TVR
# 图像合成(Image Synthesis)
**GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields**
- Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html
- Paper(Oral): https://arxiv.org/abs/2011.12100
- Code: https://github.com/autonomousvision/giraffe
- Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1
**Taming Transformers for High-Resolution Image Synthesis**
- Homepage: https://compvis.github.io/taming-transformers/
- Paper(Oral): https://arxiv.org/abs/2012.09841
- Code: https://github.com/CompVis/taming-transformers
# 视图合成(View Synthesis)
**Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes**
- Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/
- Paper: https://arxiv.org/abs/2104.06935
**Self-Supervised Visibility Learning for Novel View Synthesis**
- Paper: https://arxiv.org/abs/2103.15407
- Code: None
**NeX: Real-time View Synthesis with Neural Basis Expansion**
- Homepage: https://nex-mpi.github.io/
- Paper(Oral): https://arxiv.org/abs/2103.05606
# 风格迁移(Style Transfer)
**Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer**
- Paper: https://arxiv.org/abs/2104.05376
- Code: https://github.com/PaddlePaddle/PaddleGAN/
# 布局生成(Layout Generation)
**LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity**
- Paper: None
- Code: None
**Variational Transformer Networks for Layout Generation**
- Paper: https://arxiv.org/abs/2104.02416
- Code: None
# Domain Generalization
**Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections**
- Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/papers/Pandey_Generalization_on_Unseen_Domains_via_Inference-Time_Label-Preserving_Target_Projections_CVPR_2021_paper.pdf
- Code: https://github.com/VSumanth99/InferenceTimeDG
**Generalizable Person Re-identification with Relevance-aware Mixture of Experts**
- Paper: https://arxiv.org/abs/2105.09156
- Code: None
**RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening**
- Paper: https://arxiv.org/abs/2103.15597
- Code: https://github.com/shachoi/RobustNet
**Adaptive Methods for Real-World Domain Generalization**
- Paper: https://arxiv.org/abs/2103.15796
- Code: None
**FSDR: Frequency Space Domain Randomization for Domain Generalization**
- Paper: https://arxiv.org/abs/2103.02370
- Code: None
# Domain Adaptation
**Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation**
- Paper: https://arxiv.org/abs/2104.00808
- Code: None
**Domain Consensus Clustering for Universal Domain Adaptation**
- Paper: http://reler.net/papers/guangrui_cvpr2021.pdf
- Code: https://github.com/Solacex/Domain-Consensus-Clustering
# Open-Set
**Towards Open World Object Detection**
- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD
**Exemplar-Based Open-Set Panoptic Segmentation Network**
- Homepage: https://cv.snu.ac.kr/research/EOPSN/
- Paper: https://arxiv.org/abs/2105.08336
- Code: https://github.com/jd730/EOPSN
**Learning Placeholders for Open-Set Recognition**
- Paper(Oral): https://arxiv.org/abs/2103.15086
- Code: None
# Adversarial Attack
**IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking**
- Paper: https://arxiv.org/abs/2103.14938
- Code: https://github.com/VISION-SJTU/IoUattack
# "人-物"交互(HOI)检测
**HOTR: End-to-End Human-Object Interaction Detection with Transformers**
- Paper: https://arxiv.org/abs/2104.13682
- Code: None
**Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information**
- Paper: https://arxiv.org/abs/2103.05399
- Code: https://github.com/hitachi-rd-cv/qpic
**Reformulating HOI Detection as Adaptive Set Prediction**
- Paper: https://arxiv.org/abs/2103.05983
- Code: https://github.com/yoyomimi/AS-Net
**Detecting Human-Object Interaction via Fabricated Compositional Learning**
- Paper: https://arxiv.org/abs/2103.08214
- Code: https://github.com/zhihou7/FCL
**End-to-End Human Object Interaction Detection with HOI Transformer**
- Paper: https://arxiv.org/abs/2103.04503
- Code: https://github.com/bbepoch/HoiTransformer
# 阴影去除(Shadow Removal)
**Auto-Exposure Fusion for Single-Image Shadow Removal**
- Paper: https://arxiv.org/abs/2103.01255
- Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal
# 虚拟换衣(Virtual Try-On)
**Parser-Free Virtual Try-on via Distilling Appearance Flows**
**基于外观流蒸馏的无需人体解析的虚拟换装**
- Paper: https://arxiv.org/abs/2103.04559
- Code: https://github.com/geyuying/PF-AFN
# 标签噪声(Label Noise)
**A Second-Order Approach to Learning with Instance-Dependent Label Noise**
- Paper(Oral): https://arxiv.org/abs/2012.11854
- Code: https://github.com/UCSC-REAL/CAL
# 视频稳像(Video Stabilization)
**Real-Time Selfie Video Stabilization**
- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf
- Code: https://github.com/jiy173/selfievideostabilization
# 数据集(Datasets)
**Tracking Pedestrian Heads in Dense Crowd**
- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html
- Code1: https://github.com/Sentient07/HeadHunter
- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T
- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/
**Part-aware Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2106.06351
- Code: https://github.com/tue-mps/panoptic_parts
- Dataset: https://github.com/tue-mps/panoptic_parts
**Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos**
- Homepage: https://www.yasamin.page/hdnet_tiktok
- Paper(Oral): https://arxiv.org/abs/2103.03319
- Code: https://github.com/yasaminjafarian/HDNet_TikTok
- Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
**High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network**
- Paper: https://arxiv.org/abs/2105.09188
- Code: https://github.com/csjliang/LPTN
- Dataset: https://github.com/csjliang/LPTN
**Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark**
- Paper: https://arxiv.org/abs/2105.02440
- Code: https://github.com/VisDrone/DroneCrowd
- Dataset: https://github.com/VisDrone/DroneCrowd
**Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets**
- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
- Paper(Oral): https://arxiv.org/abs/2104.12690
- Code: https://github.com/fidler-lab/efficient-annotation-cookbook
论文下载链接:
**ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation**
- Paper: https://arxiv.org/abs/2012.05258
- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab
**Learning To Count Everything**
- Paper: https://arxiv.org/abs/2104.08391
- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything
**Semantic Image Matting**
- Paper: https://arxiv.org/abs/2104.08201
- Code: https://github.com/nowsyn/SIM
- Dataset: https://github.com/nowsyn/SIM
**Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline**
- Homepage: http://mepro.bjtu.edu.cn/resource.html
- Paper: https://arxiv.org/abs/2104.06174
- Code: None
**Visual Semantic Role Labeling for Video Understanding**
- Homepage: https://vidsitu.org/
- Paper: https://arxiv.org/abs/2104.00990
- Code: https://github.com/TheShadow29/VidSitu
- Dataset: https://github.com/TheShadow29/VidSitu
**VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild**
- Homepage: https://www.vspwdataset.com/
- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
- GitHub: https://github.com/sssdddwww2/vspw_dataset_download
**Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark**
- Homepage: https://vap.aau.dk/sewer-ml/
- Paper: https://arxiv.org/abs/2103.10619
**Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark**
- Homepage: https://vap.aau.dk/sewer-ml/
- Paper: https://arxiv.org/abs/2103.10895
**Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food**
- Paper: https://arxiv.org/abs/2103.03375
- Dataset: None
**Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges**
- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban
**When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework**
- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace
**Depth from Camera Motion and Object Detection**
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**
- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
**Scan2Cap: Context-aware Dense Captioning in RGB-D Scans**
- Paper: https://arxiv.org/abs/2012.02206
- Code: https://github.com/daveredrum/Scan2Cap
- Dataset: https://github.com/daveredrum/ScanRefer
**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
# 其他(Others)
**Fast and Accurate Model Scaling**
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html
- Code: https://github.com/facebookresearch/pycls
**Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos**
- Homepage: https://www.yasamin.page/hdnet_tiktok
- Paper(Oral): https://arxiv.org/abs/2103.03319
- Code: https://github.com/yasaminjafarian/HDNet_TikTok
- Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
**Omnimatte: Associating Objects and Their Effects in Video**
- Homepage: https://omnimatte.github.io/
- Paper(Oral): https://arxiv.org/abs/2105.06993
- Code: https://omnimatte.github.io/#code
**Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets**
- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
- Paper(Oral): https://arxiv.org/abs/2104.12690
- Code: https://github.com/fidler-lab/efficient-annotation-cookbook
**Motion Representations for Articulated Animation**
- Paper: https://arxiv.org/abs/2104.11280
- Code: https://github.com/snap-research/articulated-animation
**Deep Lucas-Kanade Homography for Multimodal Image Alignment**
- Paper: https://arxiv.org/abs/2104.11693
- Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography
**Skip-Convolutions for Efficient Video Processing**
- Paper: https://arxiv.org/abs/2104.11487
- Code: None
**KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control**
- Homepage: http://tomasjakab.github.io/KeypointDeformer
- Paper(Oral): https://arxiv.org/abs/2104.11224
- Code: https://github.com/tomasjakab/keypoint_deformer/
**Learning To Count Everything**
- Paper: https://arxiv.org/abs/2104.08391
- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything
**SOLD2: Self-supervised Occlusion-aware Line Description and Detection**
- Paper(Oral): https://arxiv.org/abs/2104.03362
- Code: https://github.com/cvg/SOLD2
**Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression**
- Homepage: https://li-wanhua.github.io/POEs/
- Paper: https://arxiv.org/abs/2103.13629
- Code: https://github.com/Li-Wanhua/POEs
**LEAP: Learning Articulated Occupancy of People**
- Paper: https://arxiv.org/abs/2104.06849
- Code: None
**Visual Semantic Role Labeling for Video Understanding**
- Homepage: https://vidsitu.org/
- Paper: https://arxiv.org/abs/2104.00990
- Code: https://github.com/TheShadow29/VidSitu
- Dataset: https://github.com/TheShadow29/VidSitu
**UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles**
- Paper: https://arxiv.org/abs/2104.00946
- Code: https://github.com/SUTDCV/UAV-Human
**Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning**
- Paper(Oral): https://arxiv.org/abs/2104.00924
- Code: None
**Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction**
- Paper: https://arxiv.org/abs/2104.00858
- Code: None
**Towards High Fidelity Face Relighting with Realistic Shadows**
- Paper: https://arxiv.org/abs/2104.00825
- Code: None
**BRepNet: A topological message passing system for solid models**
- Paper(Oral): https://arxiv.org/abs/2104.00706
- Code: None
**Visually Informed Binaural Audio Generation without Binaural Audios**
- Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
- Paper: None
- GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
- Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc
**Exploring intermediate representation for monocular vehicle pose estimation**
- Paper: None
- Code: https://github.com/Nicholasli1995/EgoNet
**Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB**
- Paper(Oral): https://arxiv.org/abs/2103.14708
- Code: None
**Invertible Image Signal Processing**
- Paper: https://arxiv.org/abs/2103.15061
- Code: https://github.com/yzxing87/Invertible-ISP
**Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling**
- Paper: https://arxiv.org/abs/2103.14858
- Code: None
**SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences**
- Paper: https://arxiv.org/abs/2103.14898
- Code: None
**Embedding Transfer with Label Relaxation for Improved Metric Learning**
- Paper: https://arxiv.org/abs/2103.14908
- Code: None
**Picasso: A CUDA-based Library for Deep Learning over 3D Meshes**
- Paper: https://arxiv.org/abs/2103.15076
- Code: https://github.com/hlei-ziyan/Picasso
**Meta-Mining Discriminative Samples for Kinship Verification**
- Paper: https://arxiv.org/abs/2103.15108
- Code: None
**Cloud2Curve: Generation and Vectorization of Parametric Sketches**
- Paper: https://arxiv.org/abs/2103.15536
- Code: None
**TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events**
- Paper: https://arxiv.org/abs/2103.15538
- Code: https://github.com/SUTDCV/SUTD-TrafficQA
**Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution**
- Homepage: http://wellyzhang.github.io/project/prae.html
- Paper: https://arxiv.org/abs/2103.14230
- Code: None
**ACRE: Abstract Causal REasoning Beyond Covariation**
- Homepage: http://wellyzhang.github.io/project/acre.html
- Paper: https://arxiv.org/abs/2103.14232
- Code: None
**Confluent Vessel Trees with Accurate Bifurcations**
- Paper: https://arxiv.org/abs/2103.14268
- Code: None
**Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling**
- Paper: https://arxiv.org/abs/2103.14338
- Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer
**Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks**
- Homepage: https://paschalidoud.github.io/neural_parts
- Paper: None
- Code: https://github.com/paschalidoud/neural_parts
**Knowledge Evolution in Neural Networks**
- Paper(Oral): https://arxiv.org/abs/2103.05152
- Code: https://github.com/ahmdtaha/knowledge_evolution
**Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning**
- Paper: https://arxiv.org/abs/2103.02148
- Code: https://github.com/guopengf/FLMRCM
**SGP: Self-supervised Geometric Perception**
- Oral
- Paper: https://arxiv.org/abs/2103.03114
- Code: https://github.com/theNded/SGP
**Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning**
- Paper: https://arxiv.org/abs/2103.02148
- Code: https://github.com/guopengf/FLMRCM
**Diffusion Probabilistic Models for 3D Point Cloud Generation**
- Paper: https://arxiv.org/abs/2103.01458
- Code: https://github.com/luost26/diffusion-point-cloud
**Scan2Cap: Context-aware Dense Captioning in RGB-D Scans**
- Paper: https://arxiv.org/abs/2012.02206
- Code: https://github.com/daveredrum/Scan2Cap
- Dataset: https://github.com/daveredrum/ScanRefer
**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
# 待添加(TODO)
- [重磅!腾讯优图20篇论文入选CVPR 2021](https://mp.weixin.qq.com/s/McAtOVh0osWZ3uppEoHC8A)
- [MePro团队三篇论文被CVPR 2021接收](https://mp.weixin.qq.com/s/GD5Zb6u_MQ8GZIAGeCGo3Q)
# 不确定中没中(Not Sure)
**CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models**
- Paper: none
- Code: https://github.com/transcendentsky/Film-Recovery
**Toward Explainable Reflection Removal with Distilling and Model Uncertainty**
- Paper: none
- Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty
**DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation**
- Paper: none
- Code: https://github.com/lhaippp/DeepOIS
**Exploring Adversarial Fake Images on Face Manifold**
- Paper: none
- Code: https://github.com/ldz666666/Style-atk
**Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task**
- Paper: none
- Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task
**Temporal Contrastive Graph for Self-supervised Video Representation Learning**
- Paper: none
- Code: https://github.com/YangLiu9208/TCG
**Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching**
- Paper: none
- Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr
**Fast and Memory-Efficient Compact Bilinear Pooling**
- Paper: none
- Code: https://github.com/cvpr2021kp2/cvpr2021kp2
**Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine**
- Paper: none
- Code: https://github.com/gapDetection/cvpr2021
**Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation**
- Paper: none
- Code: https://github.com/interactivekeypoint2020/Morph
https://github.com/ShaoQiangShen/CVPR2021
https://github.com/gillesflash/CVPR2021
https://github.com/anonymous-submission1991/BaLeNAS
https://github.com/cvpr2021dcb/cvpr2021dcb
https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578
https://github.com/AldrichZeng/FreqPrune
https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM
https://github.com/ddfss/datadrive-fss
================================================
FILE: CVPR2022-Papers-with-Code.md
================================================
# CVPR 2022 论文和开源项目合集(Papers with Code)
[CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)!
CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
> 注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
>
> - [CVPR 2019](CVPR2019-Papers-with-Code.md)
> - [CVPR 2020](CVPR2020-Papers-with-Code.md)
> - [CVPR 2021](CVPR2021-Papers-with-Code.md)
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

## 【CVPR 2022 论文开源目录】
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [GAN](#GAN)
- [GNN](#GNN)
- [MLP](#MLP)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [3D Face](#3D Face)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Visual Transformer](#Visual-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [目标检测(Object Detection)](#Object-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [小样本分类(Few-Shot Classification)](#FFC)
- [小样本分割(Few-Shot Segmentation)](#FFS)
- [图像抠图(Image Matting)](#Matting)
- [视频理解(Video Understanding)](#VU)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#Super-Resolution)
- [去模糊(Deblur)](#Deblur)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3D-Object-Detection)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D重建(3D Reconstruction)](#3D-R)
- [行人重识别(Person Re-identification)](#ReID)
- [伪装物体检测(Camouflaged Object Detection)](#COD)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#FM)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation)
- [图像修复(Image Inpainting)](#Image-Inpainting)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [人脸识别(Face Recognition)](#Face-Recognition)
- [人群计数(Crowd Counting)](#Crowd-Counting)
- [医学图像(Medical Image)](#Medical-Image)
- [视频生成(Video Generation)](#Video Generation)
- [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
- [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS)
- [步态识别(Gait Recognition)](#GR)
- [风格迁移(Style Transfer)](#ST)
- [异常检测(Anomaly Detection](#AD)
- [对抗样本(Adversarial Examples)](#AE)
- [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL)
- [雷达目标检测(Radar Object Detection)](#ROD)
- [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI)
- [图像拼接(Image Stitching)](#Image-Stitching)
- [水印(Watermarking)](#Watermarking)
- [Action Counting](#AC)
- [Grounded Situation Recognition](#GSR)
- [Zero-shot Learning](#ZSL)
- [DeepFakes](#DeepFakes)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)
# Backbone
**A ConvNet for the 2020s**
- Paper: https://arxiv.org/abs/2201.03545
- Code: https://github.com/facebookresearch/ConvNeXt
- 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw
**Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs**
- Paper: https://arxiv.org/abs/2203.06717
- Code: https://github.com/megvii-research/RepLKNet
- Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
- 中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg
**MPViT : Multi-Path Vision Transformer for Dense Prediction**
- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT
- 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg
**Mobile-Former: Bridging MobileNet and Transformer**
- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
**MetaFormer is Actually What You Need for Vision**
- Paper: https://arxiv.org/abs/2111.11418
- Code: https://github.com/sail-sg/poolformer
**Shunted Self-Attention via Multi-Scale Token Aggregation**
- Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer
**TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing**
- Paper: http://arxiv.org/abs/2203.10489
- Code: https://github.com/JierunChen/TVConv
**Learned Queries for Efficient Local Attention**
- Paper(Oral): https://arxiv.org/abs/2112.11435
- Code: https://github.com/moabarar/qna
**RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**
- Paper: https://arxiv.org/abs/2112.11081
- Code: https://github.com/DingXiaoH/RepMLP
# CLIP
**HairCLIP: Design Your Hair by Text and Reference Image**
- Paper: https://arxiv.org/abs/2112.05142
- Code: https://github.com/wty-ustc/HairCLIP
**PointCLIP: Point Cloud Understanding by CLIP**
- Paper: https://arxiv.org/abs/2112.02413
- Code: https://github.com/ZrrSkywalker/PointCLIP
**Blended Diffusion for Text-driven Editing of Natural Images**
- Paper: https://arxiv.org/abs/2111.14818
- Code: https://github.com/omriav/blended-diffusion
# GAN
**SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**
- Homepage: https://semanticstylegan.github.io/
- Paper: https://arxiv.org/abs/2112.02236
- Demo: https://semanticstylegan.github.io/videos/demo.mp4
**Style Transformer for Image Inversion and Editing**
- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer
**Unsupervised Image-to-Image Translation with Generative Prior**
- Homepage: https://www.mmlab-ntu.com/project/gpunit/
- Paper: https://arxiv.org/abs/2204.03641
- Code: https://github.com/williamyang1991/GP-UNIT
**StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**
- Homepage: https://universome.github.io/stylegan-v
- Paper: https://arxiv.org/abs/2112.14683
- Code: https://github.com/universome/stylegan-v
**OSSGAN: Open-set Semi-supervised Image Generation**
- Paper: https://arxiv.org/abs/2204.14249
- Code: https://github.com/raven38/OSSGAN
**Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**
- Paper: https://arxiv.org/abs/2204.06160
- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
# GNN
**OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks**
- Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf
- Code: https://github.com/WanyuGroup/CVPR2022-OrphicX
# MLP
**RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**
- Paper: https://arxiv.org/abs/2112.11081
- Code: https://github.com/DingXiaoH/RepMLP
# NAS
**β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search**
- Paper: https://arxiv.org/abs/2203.01665
- Code: https://github.com/Sunshine-Ye/Beta-DARTS
**ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
# OCR
**SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**
- Paper: https://arxiv.org/abs/2203.10209
- Code: https://github.com/mxin262/SwinTextSpotter
# NeRF
**Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields**
- Homepage: https://jonbarron.info/mipnerf360/
- Paper: https://arxiv.org/abs/2111.12077
- Demo: https://youtu.be/YStDS2-Ln1s
**Point-NeRF: Point-based Neural Radiance Fields**
- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
- Paper: https://arxiv.org/abs/2201.08845
- Code: https://github.com/Xharlie/point-nerf
**NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images**
- Paper: https://arxiv.org/abs/2111.13679
- Homepage: https://bmild.github.io/rawnerf/
- Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
**Urban Radiance Fields**
- Homepage: https://urban-radiance-fields.github.io/
- Paper: https://arxiv.org/abs/2111.14643
- Demo: https://youtu.be/qGlq5DZT6uc
**Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation**
- Paper: https://arxiv.org/abs/2202.13162
- Code: https://github.com/HexagonPrime/Pix2NeRF
**HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video**
- Homepage: https://grail.cs.washington.edu/projects/humannerf/
- Paper: https://arxiv.org/abs/2201.04127
- Demo: https://youtu.be/GM-RoZEymmw
# 3D Face
**ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations**
- Paper: https://arxiv.org/abs/2203.14510
- Code: https://github.com/MingwuZheng/ImFace
# 长尾分布(Long-Tail)
**Retrieval Augmented Classification for Long-Tail Visual Recognition**
- Paper: https://arxiv.org/abs/2202.11233
- Code: None
# Visual Transformer
## Backbone
**MPViT : Multi-Path Vision Transformer for Dense Prediction**
- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT
**MetaFormer is Actually What You Need for Vision**
- Paper: https://arxiv.org/abs/2111.11418
- Code: https://github.com/sail-sg/poolformer
**Mobile-Former: Bridging MobileNet and Transformer**
- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
**Shunted Self-Attention via Multi-Scale Token Aggregation**
- Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer
**Learned Queries for Efficient Local Attention**
- Paper(Oral): https://arxiv.org/abs/2112.11435
- Code: https://github.com/moabarar/qna
## 应用(Application)
**Language-based Video Editing via Multi-Modal Multi-Level Transformer**
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
**MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
**Embracing Single Stride 3D Object Detector with Sparse Transformer**
- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
- 中文解读:https://zhuanlan.zhihu.com/p/476056546
**Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.02891
- Code: https://github.com/xulianuwa/MCTformer
**Spatio-temporal Relation Modeling for Few-shot Action Recognition**
- Paper: https://arxiv.org/abs/2112.05132
- Code: https://github.com/Anirudh257/strm
**Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**
- Paper: https://arxiv.org/abs/2111.07910
- Code: https://github.com/caiyuanhao1998/MST
**Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**
- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT
**GroupViT: Semantic Segmentation Emerges from Text Supervision**
- Homepage: https://jerryxu.net/GroupViT/
- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y
**Restormer: Efficient Transformer for High-Resolution Image Restoration**
- Paper: https://arxiv.org/abs/2111.09881
- Code: https://github.com/swz30/Restormer
**Splicing ViT Features for Semantic Appearance Transfer**
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
**Self-supervised Video Transformer**
- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt
**Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**
- Paper: https://arxiv.org/abs/2203.02664
- Code: https://github.com/rulixiang/afa
**Accelerating DETR Convergence via Semantic-Aligned Matching**
- Paper: https://arxiv.org/abs/2203.06883
- Code: https://github.com/ZhangGongjie/SAM-DETR
**DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**
- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
**Style Transformer for Image Inversion and Editing**
- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer
**MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**
- Paper: https://arxiv.org/abs/2203.10981
- Code: https://github.com/kuanchihhuang/MonoDTR
**Mask Transfiner for High-Quality Instance Segmentation**
- Paper: https://arxiv.org/abs/2111.13673
- Code: https://github.com/SysCV/transfiner
**Language as Queries for Referring Video Object Segmentation**
- Paper: https://arxiv.org/abs/2201.00487
- Code: https://github.com/wjn922/ReferFormer
- 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ
**X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**
- Paper: https://arxiv.org/abs/2203.00843
- Code: https://github.com/CurryYuan/X-Trans2Cap
**AdaMixer: A Fast-Converging Query-Based Object Detector**
- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer
**Omni-DETR: Omni-Supervised Object Detection with Transformers**
- Paper: https://arxiv.org/abs/2203.16089
- Code: https://github.com/amazon-research/omni-detr
**SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**
- Paper: https://arxiv.org/abs/2203.10209
- Code: https://github.com/mxin262/SwinTextSpotter
**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC
**Collaborative Transformers for Grounded Situation Recognition**
- Paper: https://arxiv.org/abs/2203.16518
- Code: https://github.com/jhcho99/CoFormer
**NFormer: Robust Person Re-identification with Neighbor Transformer**
- Paper: https://arxiv.org/abs/2204.09331
- Code: https://github.com/haochenheheda/NFormer
**Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**
- Paper: https://arxiv.org/abs/2201.06889
- Code: None
**Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**
- Paper(Oral): https://arxiv.org/abs/2204.08680
- Code: https://github.com/zengwang430521/TCFormer
**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX
**Safe Self-Refinement for Transformer-based Domain Adaptation**
- Paper: https://arxiv.org/abs/2204.07683
- Code: https://github.com/tsun/SSRT
**Fast Point Transformer**
- Homepage: http://cvlab.postech.ac.kr/research/FPT/
- Paper: https://arxiv.org/abs/2112.04702
- Code: https://github.com/POSTECH-CVLab/FastPointTransformer
**Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval**
- Paper: https://arxiv.org/abs/2204.09730
- Code: https://github.com/mshukor/TFood
**DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**
- Paper: https://arxiv.org/abs/2111.14887
- Code: https://github.com/lhoyer/DAFormer
**Stratified Transformer for 3D Point Cloud Segmentation**
- Paper: https://arxiv.org/pdf/2203.14508.pdf
- Code: https://github.com/dvlab-research/Stratified-Transformer
# 视觉和语言(Vision-Language)
**Conditional Prompt Learning for Vision-Language Models**
- Paper: https://arxiv.org/abs/2203.05557
- Code: https://github.com/KaiyangZhou/CoOp
**Bridging Video-text Retrieval with Multiple Choice Question**
- Paper: https://arxiv.org/abs/2201.04850
- Code: https://github.com/TencentARC/MCQ
**Visual Abductive Reasoning**
- Paper: https://arxiv.org/abs/2203.14040
- Code: https://github.com/leonnnop/VAR
# 自监督学习(Self-supervised Learning)
**UniVIP: A Unified Framework for Self-Supervised Visual Pre-training**
- Paper: https://arxiv.org/abs/2203.06965
- Code: None
**Crafting Better Contrastive Views for Siamese Representation Learning**
- Paper: https://arxiv.org/abs/2202.03278
- Code: https://github.com/xyupeng/ContrastiveCrop
- 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A
**HCSC: Hierarchical Contrastive Selective Coding**
- Homepage: https://github.com/gyfastas/HCSC
- Paper: https://arxiv.org/abs/2202.00455
- 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ
**DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**
- Paper: https://arxiv.org/abs/2204.10437
- Code: https://github.com/JLiangLab/DiRA
# 数据增强(Data Augmentation)
**TeachAugment: Data Augmentation Optimization Using Teacher Knowledge**
- Paper: https://arxiv.org/abs/2202.12513
- Code: https://github.com/DensoITLab/TeachAugment
**AlignMixup: Improving Representations By Interpolating Aligned Features**
- Paper: https://arxiv.org/abs/2103.15375
- Code: https://github.com/shashankvkt/AlignMixup_CVPR22
# 知识蒸馏(Knowledge Distillation)
**Decoupled Knowledge Distillation**
- Paper: https://arxiv.org/abs/2203.08679
- Code: https://github.com/megvii-research/mdistiller
- 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw
# 目标检测(Object Detection)
**BoxeR: Box-Attention for 2D and 3D Transformers**
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
**DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**
- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
**Accelerating DETR Convergence via Semantic-Aligned Matching**
- Paper: https://arxiv.org/abs/2203.06883
- Code: https://github.com/ZhangGongjie/SAM-DETR
**Localization Distillation for Dense Object Detection**
- Paper: https://arxiv.org/abs/2102.12252
- Code: https://github.com/HikariTJU/LD
- Code2: https://github.com/HikariTJU/LD
- 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
**Focal and Global Knowledge Distillation for Detectors**
- Paper: https://arxiv.org/abs/2111.11837
- Code: https://github.com/yzd-v/FGD
- 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ
**A Dual Weighting Label Assignment Scheme for Object Detection**
- Paper: https://arxiv.org/abs/2203.09730
- Code: https://github.com/strongwolf/DW
**AdaMixer: A Fast-Converging Query-Based Object Detector**
- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer
**Omni-DETR: Omni-Supervised Object Detection with Transformers**
- Paper: https://arxiv.org/abs/2203.16089
- Code: https://github.com/amazon-research/omni-detr
**SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection**
- Paper(Oral): https://arxiv.org/abs/2203.06398
- Code: https://github.com/CityU-AIM-Group/SIGMA
## 半监督目标检测
**Dense Learning based Semi-Supervised Object Detection**
- Paper: https://arxiv.org/abs/2204.07300
- Code: https://github.com/chenbinghui1/DSL
# 目标跟踪(Visual Tracking)
**Correlation-Aware Deep Tracking**
- Paper: https://arxiv.org/abs/2203.01666
- Code: None
**TCTrack: Temporal Contexts for Aerial Tracking**
- Paper: https://arxiv.org/abs/2203.01885
- Code: https://github.com/vision4robotics/TCTrack
## 多模态目标跟踪
**Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**
- Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
- Paper: https://arxiv.org/abs/2204.04120
## 多目标跟踪(Multi-Object Tracking)
**Learning of Global Objective for Network Flow in Multi-Object Tracking**
- Paper: https://arxiv.org/abs/2203.16210
- Code: None
**DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**
- Homepage: https://dancetrack.github.io
- Paper: https://arxiv.org/abs/2111.14690
- Dataset: https://github.com/DanceTrack/DanceTrack
# 语义分割(Semantic Segmentation)
**Novel Class Discovery in Semantic Segmentation**
- Homepage: https://ncdss.github.io/
- Paper: https://arxiv.org/abs/2112.01900
- Code: https://github.com/HeliosZhao/NCDSS
**Deep Hierarchical Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.14335
- Code: https://github.com/0liliulei/HieraSeg
**Rethinking Semantic Segmentation: A Prototype View**
- Paper(Oral): https://arxiv.org/abs/2203.15102
- Code: https://github.com/tfzhou/ProtoSeg
## 弱监督语义分割
**Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.00962
- Code: https://github.com/zhaozhengChen/ReCAM
**Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.02891
- Code: https://github.com/xulianuwa/MCTformer
**Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**
- Paper: https://arxiv.org/abs/2203.02664
- Code: https://github.com/rulixiang/afa
**CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.02668
- Code: https://github.com/CVI-SZU/CLIMS
**CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.13505
- Code: https://github.com/CVI-SZU/CCAM
**FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation**
- Homeapage: http://cvlab.postech.ac.kr/research/FIFO/
- Paper(Oral): https://arxiv.org/abs/2204.01587
- Code: https://github.com/sohyun-l/FIFO
**Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.09653
- Code: https://github.com/maeve07/RCA.git
## 半监督语义分割
**ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2106.05095
- Code: https://github.com/LiheYoung/ST-PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
**Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels**
- Homepage: https://haochen-wang409.github.io/U2PL/
- Paper: https://arxiv.org/abs/2203.03884
- Code: https://github.com/Haochen-Wang409/U2PL
- 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ
**Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation**
- Paper: https://arxiv.org/pdf/2111.12903.pdf
- Code: https://github.com/yyliu01/PS-MT
## 域自适应语义分割
**Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation**
- Paper: https://arxiv.org/abs/2111.12940
- Code: https://github.com/BIT-DA/RIPU
**DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**
- Paper: https://arxiv.org/abs/2111.14887
- Code: https://github.com/lhoyer/DAFormer
## 无监督语义分割
**GroupViT: Semantic Segmentation Emerges from Text Supervision**
- Homepage: https://jerryxu.net/GroupViT/
- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y
## 少样本语义分割
**Generalized Few-shot Semantic Segmentation**
- Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf
- Code: https://github.com/dvlab-research/GFS-Seg
# 实例分割(Instance Segmentation)
**BoxeR: Box-Attention for 2D and 3D Transformers**
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
**E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation**
- Paper: https://arxiv.org/abs/2203.04074
- Code: https://github.com/zhang-tao-whu/e2ec
**Mask Transfiner for High-Quality Instance Segmentation**
- Paper: https://arxiv.org/abs/2111.13673
- Code: https://github.com/SysCV/transfiner
**Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity**
- Homepage: https://sites.google.com/view/generic-grouping/
- Paper: https://arxiv.org/abs/2204.06107
- Code: https://github.com/facebookresearch/Generic-Grouping
## 自监督实例分割
**FreeSOLO: Learning to Segment Objects without Annotations**
- Paper: https://arxiv.org/abs/2202.12181
- Code: https://github.com/NVlabs/FreeSOLO
## 视频实例分割
**Efficient Video Instance Segmentation via Tracklet Query and Proposal**
- Homepage: https://jialianwu.com/projects/EfficientVIS.html
- Paper: https://arxiv.org/abs/2203.01853
- Demo: https://youtu.be/sSPMzgtMKCE
**Temporally Efficient Vision Transformer for Video Instance Segmentation**
- Paper: https://arxiv.org/abs/2204.08412
- Code: https://github.com/hustvl/TeViT
# 全景分割(Panoptic Segmentation)
**Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers**
- Paper: https://arxiv.org/abs/2109.03814
- Code: https://github.com/zhiqi-li/Panoptic-SegFormer
**Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**
- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
# 小样本分类(Few-Shot Classification)
**Integrative Few-Shot Learning for Classification and Segmentation**
- Paper: https://arxiv.org/abs/2203.15712
- Code: https://github.com/dahyun-kang/ifsl
**Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification**
- Paper: https://arxiv.org/abs/2106.05517
- Code: https://github.com/LouieYang/MCL
# 小样本分割(Few-Shot Segmentation)
**Learning What Not to Segment: A New Perspective on Few-Shot Segmentation**
- Paper: https://arxiv.org/abs/2203.07615
- Code: https://github.com/chunbolang/BAM
**Integrative Few-Shot Learning for Classification and Segmentation**
- Paper: https://arxiv.org/abs/2203.15712
- Code: https://github.com/dahyun-kang/ifsl
**Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation**
- Paper: https://arxiv.org/abs/2204.10638
- Code: None
# 图像抠图(Image Matting)
**Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**
- Paper: https://arxiv.org/abs/2201.06889
- Code: None
# 视频理解(Video Understanding)
**Self-supervised Video Transformer**
- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt
**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC
**FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**
- Paper(Oral): https://arxiv.org/abs/2204.03646
- Dataset: https://github.com/xujinglin/FineDiving
- Code: https://github.com/xujinglin/FineDiving
- 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
**Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition**
- Paper(Oral): https://arxiv.org/abs/2204.02148
- Code: None
## 行为识别(Action Recognition)
**Spatio-temporal Relation Modeling for Few-shot Action Recognition**
- Paper: https://arxiv.org/abs/2112.05132
- Code: https://github.com/Anirudh257/strm
## 动作检测(Action Detection)
**End-to-End Semi-Supervised Learning for Video Action Detection**
- Paper: https://arxiv.org/abs/2203.04251
- Code: None
# 图像编辑(Image Editing)
**Style Transformer for Image Inversion and Editing**
- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer
**Blended Diffusion for Text-driven Editing of Natural Images**
- Paper: https://arxiv.org/abs/2111.14818
- Code: https://github.com/omriav/blended-diffusion
**SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**
- Homepage: https://semanticstylegan.github.io/
- Paper: https://arxiv.org/abs/2112.02236
- Demo: https://semanticstylegan.github.io/videos/demo.mp4
# Low-level Vision
**ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
**Restormer: Efficient Transformer for High-Resolution Image Restoration**
- Paper: https://arxiv.org/abs/2111.09881
- Code: https://github.com/swz30/Restormer
**Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements**
- Paper(Oral): https://arxiv.org/abs/2111.12855
- Code: https://github.com/edongdongchen/REI
# 超分辨率(Super-Resolution)
## 图像超分辨率(Image Super-Resolution)
**Learning the Degradation Distribution for Blind Image Super-Resolution**
- Paper: https://arxiv.org/abs/2203.04962
- Code: https://github.com/greatlog/UnpairedSR
## 视频超分辨率(Video Super-Resolution)
**BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment**
- Paper: https://arxiv.org/abs/2104.13371
- Code: https://github.com/open-mmlab/mmediting
- Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
**Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling**
- Paper: https://arxiv.org/abs/2204.07114
- Code: None
**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX
# 去模糊(Deblur)
## 图像去模糊(Image Deblur)
**Learning to Deblur using Light Field Generated and Real Defocus Images**
- Homepage: http://lyruan.com/Projects/DRBNet/
- Paper(Oral): https://arxiv.org/abs/2204.00442
- Code: https://github.com/lingyanruan/DRBNet
# 3D点云(3D Point Cloud)
**Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**
- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT
**A Unified Query-based Paradigm for Point Cloud Understanding**
- Paper: https://arxiv.org/abs/2203.01252
- Code: None
**CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding**
- Paper: https://arxiv.org/abs/2203.00680
- Code: https://github.com/MohamedAfham/CrossPoint
**PointCLIP: Point Cloud Understanding by CLIP**
- Paper: https://arxiv.org/abs/2112.02413
- Code: https://github.com/ZrrSkywalker/PointCLIP
**Fast Point Transformer**
- Homepage: http://cvlab.postech.ac.kr/research/FPT/
- Paper: https://arxiv.org/abs/2112.04702
- Code: https://github.com/POSTECH-CVLab/FastPointTransformer
**RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds**
- Paper: https://arxiv.org/abs/2205.11028
- Code: https://github.com/gxd1994/RCP
**The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution**
- Paper: https://arxiv.org/abs/2205.15210
- Code: https://github.com/GostInShell/PaRI-Conv
# 3D目标检测(3D Object Detection)
**Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds**
- Paper(Oral): https://arxiv.org/abs/2203.11139
- Code: https://github.com/yifanzhang713/IA-SSD
- Demo: https://www.youtube.com/watch?v=3jP2o9KXunA
**BoxeR: Box-Attention for 2D and 3D Transformers**
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
**Embracing Single Stride 3D Object Detector with Sparse Transformer**
- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
**Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes**
- Paper: https://arxiv.org/abs/2011.12001
- Code: https://github.com/qq456cvb/CanonicalVoting
**MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**
- Paper: https://arxiv.org/abs/2203.10981
- Code: https://github.com/kuanchihhuang/MonoDTR
**HyperDet3D: Learning a Scene-conditioned 3D Object Detector**
- Paper: https://arxiv.org/abs/2204.05599
- Code: None
**OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data**
- Paper: https://arxiv.org/abs/2204.06577
- Code: https://github.com/dschinagl/occam
**DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**
- Homepage: https://thudair.baai.ac.cn/index
- Paper: https://arxiv.org/abs/2204.05575
- Code: https://github.com/AIR-THU/DAIR-V2X
**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
- Homepage: https://ithaca365.mae.cornell.edu/
- Paper: https://arxiv.org/abs/2208.01166
# 3D语义分割(3D Semantic Segmentation)
**Scribble-Supervised LiDAR Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.08537
- Dataset: https://github.com/ouenal/scribblekitti
**Stratified Transformer for 3D Point Cloud Segmentation**
- Paper: https://arxiv.org/pdf/2203.14508.pdf
- Code: https://github.com/dvlab-research/Stratified-Transformer
# 3D实例分割(3D Instance Segmentation)
**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
- Homepage: https://ithaca365.mae.cornell.edu/
- Paper: https://arxiv.org/abs/2208.01166
# 3D目标跟踪(3D Object Tracking)
**Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds**
- Paper: https://arxiv.org/abs/2203.01730
- Code: https://github.com/Ghostish/Open3DSOT
**PTTR: Relational 3D Point Cloud Object Tracking with Transformer**
- Paper: https://arxiv.org/abs/2112.02857
- Code: https://github.com/Jasonkks/PTTR
# 3D人体姿态估计(3D Human Pose Estimation)
**MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation**
- Paper: https://arxiv.org/abs/2111.12707
- Code: https://github.com/Vegetebird/MHFormer
- 中文解读: https://zhuanlan.zhihu.com/p/439459426
**MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
**Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation**
- Paper: https://arxiv.org/abs/2203.07697
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw
**BEV: Putting People in their Place: Monocular Regression of 3D People in Depth**
- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code: https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human
- Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI
# 3D语义场景补全(3D Semantic Scene Completion)
**MonoScene: Monocular 3D Semantic Scene Completion**
- Paper: https://arxiv.org/abs/2112.00726
- Code: https://github.com/cv-rits/MonoScene
# 3D重建(3D Reconstruction)
**BANMo: Building Animatable 3D Neural Models from Many Casual Videos**
- Homepage: https://banmo-www.github.io/
- Paper: https://arxiv.org/abs/2112.12761
- Code: https://github.com/facebookresearch/banmo
- 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
# 行人重识别(Person Re-identification)
**NFormer: Robust Person Re-identification with Neighbor Transformer**
- Paper: https://arxiv.org/abs/2204.09331
- Code: https://github.com/haochenheheda/NFormer
# 伪装物体检测(Camouflaged Object Detection)
**Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection**
- Paper: https://arxiv.org/abs/2203.02688
- Code: https://github.com/lartpang/ZoomNet
# 深度估计(Depth Estimation)
## 单目深度估计
**NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation**
- Paper: https://arxiv.org/abs/2203.01502
- Code: None
**OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion**
- Paper: https://arxiv.org/abs/2203.00838
- Code: None
**Toward Practical Self-Supervised Monocular Indoor Depth Estimation**
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
**P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior**
- Paper: https://arxiv.org/abs/2204.02091
- Code: https://github.com/SysCV/P3Depth
**Multi-Frame Self-Supervised Depth with Transformers**
- Homepage: https://sites.google.com/tri.global/depthformer
- Paper: https://arxiv.org/abs/2204.07616
- Code: None
# 立体匹配(Stereo Matching)
**ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching**
- Paper: https://arxiv.org/abs/2203.02146
- Code: https://github.com/gangweiX/ACVNet
# 特征匹配(Feature Matching)
**ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching**
- Paper: https://arxiv.org/abs/2204.11700
- Code: None
# 车道线检测(Lane Detection)
**Rethinking Efficient Lane Detection via Curve Modeling**
- Paper: https://arxiv.org/abs/2203.02431
- Code: https://github.com/voldemortX/pytorch-auto-drive
- Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
**A Keypoint-based Global Association Network for Lane Detection**
- Paper: https://arxiv.org/abs/2204.07335
- Code: https://github.com/Wolfwjs/GANet
# 光流估计(Optical Flow Estimation)
**Imposing Consistency for Optical Flow Estimation**
- Paper: https://arxiv.org/abs/2204.07262
- Code: None
**Deep Equilibrium Optical Flow Estimation**
- Paper: https://arxiv.org/abs/2204.08442
- Code: https://github.com/locuslab/deq-flow
**GMFlow: Learning Optical Flow via Global Matching**
- Paper(Oral): https://arxiv.org/abs/2111.13680
- Code: https://github.com/haofeixu/gmflow
# 图像修复(Image Inpainting)
**Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding**
- Paper: https://arxiv.org/abs/2203.00867
- Code: https://github.com/DQiaole/ZITS_inpainting
# 图像检索(Image Retrieval)
**Correlation Verification for Image Retrieval**
- Paper(Oral): https://arxiv.org/abs/2204.01458
- Code: https://github.com/sungonce/CVNet
# 人脸识别(Face Recognition)
**AdaFace: Quality Adaptive Margin for Face Recognition**
- Paper(Oral): https://arxiv.org/abs/2204.00964
- Code: https://github.com/mk-minchul/AdaFace
# 人群计数(Crowd Counting)
**Leveraging Self-Supervision for Cross-Domain Crowd Counting**
- Paper: https://arxiv.org/abs/2103.16291
- Code: None
# 医学图像(Medical Image)
**BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation**
- Paper: https://arxiv.org/abs/2203.02533
- Code: None
**Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification**
- Paper: https://arxiv.org/abs/2111.12918
- Code: https://github.com/FBLADL/ACPL
**DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**
- Paper: https://arxiv.org/abs/2204.10437
- Code: https://github.com/JLiangLab/DiRA
# 视频生成(Video Generation)
**StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**
- Homepage: https://universome.github.io/stylegan-v
- Paper: https://arxiv.org/abs/2112.14683
- Code: https://github.com/universome/stylegan-v
- Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4
# 场景图生成(Scene Graph Generation)
**SGTR: End-to-end Scene Graph Generation with Transformer**
- Paper: https://arxiv.org/abs/2112.12970
- Code: None
# 参考视频目标分割(Referring Video Object Segmentation)
**Language as Queries for Referring Video Object Segmentation**
- Paper: https://arxiv.org/abs/2201.00487
- Code: https://github.com/wjn922/ReferFormer
**ReSTR: Convolution-free Referring Image Segmentation Using Transformers**
- Paper: https://arxiv.org/abs/2203.16768
- Code: None
# 步态识别(Gait Recognition)
**Gait Recognition in the Wild with Dense 3D Representations and A Benchmark**
- Homepage: https://gait3d.github.io/
- Paper: https://arxiv.org/abs/2204.02569
- Code: https://github.com/Gait3D/Gait3D-Benchmark
# 风格迁移(Style Transfer)
**StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions**
- Homepage: https://lukashoel.github.io/stylemesh/
- Paper: https://arxiv.org/abs/2112.01530
- Code: https://github.com/lukasHoel/stylemesh
- Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks
# 异常检测(Anomaly Detection)
**UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**
- Paper: https://arxiv.org/abs/2111.08644
- Dataset: https://github.com/lilygeorgescu/UBnormal
**Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection**
- Paper(Oral): https://arxiv.org/abs/2111.09099
- Code: https://github.com/ristea/sspcab
对抗样本)
# 对抗样本(Adversarial Examples)
**Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon**
- Paper: https://arxiv.org/abs/2203.03818
- Code: https://github.com/hncszyq/ShadowAttack
**LAS-AT: Adversarial Training with Learnable Attack Strategy**
- Paper(Oral): https://arxiv.org/abs/2203.06616
- Code: https://github.com/jiaxiaojunQAQ/LAS-AT
**Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection**
- Paper: https://arxiv.org/abs/2112.04532
- Code: https://github.com/joellliu/SegmentAndComplete
# 弱监督物体检测(Weakly Supervised Object Localization)
**Weakly Supervised Object Localization as Domain Adaption**
- Paper: https://arxiv.org/abs/2203.01714
- Code: https://github.com/zh460045050/DA-WSOL_CVPR2022
# 雷达目标检测(Radar Object Detection)
**Exploiting Temporal Relations on Radar Perception for Autonomous Driving**
- Paper: https://arxiv.org/abs/2204.01184
- Code: None
# 高光谱图像重建(Hyperspectral Image Reconstruction)
**Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**
- Paper: https://arxiv.org/abs/2111.07910
- Code: https://github.com/caiyuanhao1998/MST
# 图像拼接(Image Stitching)
**Deep Rectangling for Image Stitching: A Learning Baseline**
- Paper(Oral): https://arxiv.org/abs/2203.03831
- Code: https://github.com/nie-lang/DeepRectangling
- Dataset: https://github.com/nie-lang/DeepRectangling
- 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
# 水印(Watermarking)
**Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings**
- Paper: https://arxiv.org/abs/2104.13450
- Code: None
# Action Counting
**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC
# Grounded Situation Recognition
**Collaborative Transformers for Grounded Situation Recognition**
- Paper: https://arxiv.org/abs/2203.16518
- Code: https://github.com/jhcho99/CoFormer
# Zero-shot Learning
**Unseen Classes at a Later Time? No Problem**
- Paper: https://arxiv.org/abs/2203.16517
- Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time
# DeepFakes
**Detecting Deepfakes with Self-Blended Images**
- Paper(Oral): https://arxiv.org/abs/2204.08376
- Code: https://github.com/mapooon/SelfBlendedImages
# 数据集(Datasets)
**It's About Time: Analog Clock Reading in the Wild**
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
**Toward Practical Self-Supervised Monocular Indoor Depth Estimation**
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
**Kubric: A scalable dataset generator**
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
**Scribble-Supervised LiDAR Semantic Segmentation**
- Paper: https://arxiv.org/abs/2203.08537
- Dataset: https://github.com/ouenal/scribblekitti
**Deep Rectangling for Image Stitching: A Learning Baseline**
- Paper(Oral): https://arxiv.org/abs/2203.03831
- Code: https://github.com/nie-lang/DeepRectangling
- Dataset: https://github.com/nie-lang/DeepRectangling
- 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
**ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer**
- Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
- Paper: https://arxiv.org/abs/2204.02389
- Dataset: https://github.com/rhgao/ObjectFolder
- Demo:https://youtu.be/e5aToT3LkRA
**Shape from Polarization for Complex Scenes in the Wild**
- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild
**Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**
- Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
- Paper: https://arxiv.org/abs/2204.04120
**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC
**FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**
- Paper(Oral): https://arxiv.org/abs/2204.03646
- Dataset: https://github.com/xujinglin/FineDiving
- Code: https://github.com/xujinglin/FineDiving
- 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
**Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**
- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout
**DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**
- Homepage: https://thudair.baai.ac.cn/index
- Paper: https://arxiv.org/abs/2204.05575
- Code: https://github.com/AIR-THU/DAIR-V2X
**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX
**Putting People in their Place: Monocular Regression of 3D People in Depth**
- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code:https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human
**UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**
- Paper: https://arxiv.org/abs/2111.08644
- Dataset: https://github.com/lilygeorgescu/UBnormal
**DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**
- Homepage: https://dancetrack.github.io
- Paper: https://arxiv.org/abs/2111.14690
- Dataset: https://github.com/DanceTrack/DanceTrack
**Visual Abductive Reasoning**
- Paper: https://arxiv.org/abs/2203.14040
- Code: https://github.com/leonnnop/VAR
**Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**
- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
- Homepage: https://ithaca365.mae.cornell.edu/
- Paper: https://arxiv.org/abs/2208.01166
# 新任务(New Task)
**Language-based Video Editing via Multi-Modal Multi-Level Transformer**
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
**It's About Time: Analog Clock Reading in the Wild**
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
**Splicing ViT Features for Semantic Appearance Transfer**
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
**Visual Abductive Reasoning**
- Paper: https://arxiv.org/abs/2203.14040
- Code: https://github.com/leonnnop/VAR
# 其他(Others)
**Kubric: A scalable dataset generator**
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
**X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**
- Paper: https://arxiv.org/abs/2203.00843
- Code: https://github.com/CurryYuan/X-Trans2Cap
**Balanced MSE for Imbalanced Visual Regression**
- Paper(Oral): https://arxiv.org/abs/2203.16427
- Code: https://github.com/jiawei-ren/BalancedMSE
**SNUG: Self-Supervised Neural Dynamic Garments**
- Homepage: http://mslab.es/projects/SNUG/
- Paper(Oral): https://arxiv.org/abs/2204.02219
- Code: https://github.com/isantesteban/snug
**Shape from Polarization for Complex Scenes in the Wild**
- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild
**LASER: LAtent SpacE Rendering for 2D Visual Localization**
- Paper(Oral): https://arxiv.org/abs/2204.00157
- Code: None
**Single-Photon Structured Light**
- Paper(Oral): https://arxiv.org/abs/2204.05300
- Code: None
**3DeformRS: Certifying Spatial Deformations on Point Clouds**
- Paper: https://arxiv.org/abs/2204.05687
- Code: None
**Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**
- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout
**Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes**
- Paper: https://arxiv.org/abs/2203.13412
- Code: https://github.com/zjsong/SSPL
**Robust and Accurate Superquadric Recovery: a Probabilistic Approach**
- Paper(Oral): https://arxiv.org/abs/2111.14517
- Code: https://github.com/bmlklwx/EMS-superquadric_fitting
**Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence**
- Paper: https://arxiv.org/abs/2203.00911
- Code: None
**Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**
- Paper(Oral): https://arxiv.org/abs/2204.08680
- Code: https://github.com/zengwang430521/TCFormer
**DeepDPM: Deep Clustering With an Unknown Number of Clusters**
- Paper: https://arxiv.org/abs/2203.14309
- Code: https://github.com/BGU-CS-VIL/DeepDPM
**ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic**
- Paper: https://arxiv.org/abs/2111.14447
- Code: https://github.com/YoadTew/zero-shot-image-to-text
**Proto2Proto: Can you recognize the car, the way I do?**
- Paper: https://arxiv.org/abs/2204.11830
- Code: https://github.com/archmaester/proto2proto
**Putting People in their Place: Monocular Regression of 3D People in Depth**
- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code:https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human
**Light Field Neural Rendering**
- Homepage: https://light-field-neural-rendering.github.io/
- Paper(Oral): https://arxiv.org/abs/2112.09687
- Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering
**Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**
- Paper: https://arxiv.org/abs/2204.06160
- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
**Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning**
- Paper: https://arxiv.org/abs/2203.14333
- Code: https://github.com/0liliulei/LIIR
================================================
FILE: CVPR2023-Papers-with-Code.md
================================================
# CVPR 2023 论文和开源项目合集(Papers with Code)
[CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)!
**25.78% = 2360 / 9155**
CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate.
> 注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
>
> - [CVPR 2019](CVPR2019-Papers-with-Code.md)
> - [CVPR 2020](CVPR2020-Papers-with-Code.md)
> - [CVPR 2021](CVPR2021-Papers-with-Code.md)
> - [CVPR 2022](CVPR2022-Papers-with-Code.md)
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

# 【CVPR 2023 论文开源目录】
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [MAE](#MAE)
- [GAN](#GAN)
- [GNN](#GNN)
- [MLP](#MLP)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [Prompt](#Prompt)
- [Diffusion Models(扩散模型)](#Diffusion)
- [Avatars](#Avatars)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [异常检测(Anomaly Detection)](#AD)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [场景图生成(Scene Graph Generation)](#SGG)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)
# Backbone
**Integrally Pre-Trained Transformer Pyramid Networks**
- Paper: https://arxiv.org/abs/2211.12735
- Code: https://github.com/sunsmarterjie/iTPN
**Stitchable Neural Networks**
- Homepage: https://snnet.github.io/
- Paper: https://arxiv.org/abs/2302.06586
- Code: https://github.com/ziplab/SN-Net
**Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**
- Paper: https://arxiv.org/abs/2303.03667
- Code: https://github.com/JierunChen/FasterNet
**BiFormer: Vision Transformer with Bi-Level Routing Attention**
- Paper: None
- Code: https://github.com/rayleizhu/BiFormer
**DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network**
- Paper: https://arxiv.org/abs/2303.02165
- Code: https://github.com/alibaba/lightweight-neural-architecture-search
**Vision Transformer with Super Token Sampling**
- Paper: https://arxiv.org/abs/2211.11167
- Code: https://github.com/hhb072/SViT
**Hard Patches Mining for Masked Image Modeling**
- Paper: None
- Code: None
**SMPConv: Self-moving Point Representations for Continuous Convolution**
- Paper: https://arxiv.org/abs/2304.02330
- Code: https://github.com/sangnekim/SMPConv
**Making Vision Transformers Efficient from A Token Sparsification View**
- Paper: https://arxiv.org/abs/2303.08685
- Code: https://github.com/changsn/STViT-R
# CLIP
**GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**
- Paper: https://arxiv.org/abs/2301.12959
- Code: https://github.com/tobran/GALIP
**DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**
- Paper: https://arxiv.org/abs/2303.06285
- Code: https://github.com/Yueming6568/DeltaEdit
# MAE
**Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders**
- Paper: https://arxiv.org/abs/2212.06785
- Code: https://github.com/ZrrSkywalker/I2P-MAE
**Generic-to-Specific Distillation of Masked Autoencoders**
- Paper: https://arxiv.org/abs/2302.14771
- Code: https://github.com/pengzhiliang/G2SD
# GAN
**DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**
- Paper: https://arxiv.org/abs/2303.06285
- Code: https://github.com/Yueming6568/DeltaEdit
# NeRF
**NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior**
- Home: https://nope-nerf.active.vision/
- Paper: https://arxiv.org/abs/2212.07388
- Code: None
**Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures**
- Paper: https://arxiv.org/abs/2211.07600
- Code: https://github.com/eladrich/latent-nerf
**NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis**
- Paper: https://arxiv.org/abs/2301.08556
- Code: None
**Panoptic Lifting for 3D Scene Understanding with Neural Fields**
- Homepage: https://nihalsid.github.io/panoptic-lifting/
- Paper: https://arxiv.org/abs/2212.09802
- Code: None
**NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer**
- Homepage: https://redrock303.github.io/nerflix/
- Paper: https://arxiv.org/abs/2303.06919
- Code: None
**HNeRV: A Hybrid Neural Representation for Videos**
- Homepage: https://haochen-rye.github.io/HNeRV
- Paper: https://arxiv.org/abs/2304.02633
- Code: https://github.com/haochen-rye/HNeRV
# DETR
**DETRs with Hybrid Matching**
- Paper: https://arxiv.org/abs/2207.13080
- Code: https://github.com/HDETR
# Prompt
**Diversity-Aware Meta Visual Prompting**
- Paper: https://arxiv.org/abs/2303.08138
- Code: https://github.com/shikiw/DAM-VP
# NAS
**PA&DA: Jointly Sampling PAth and DAta for Consistent NAS**
- Paper: https://arxiv.org/abs/2302.14772
- Code: https://github.com/ShunLu91/PA-DA
# Avatars
**Structured 3D Features for Reconstructing Relightable and Animatable Avatars**
- Homepage: https://enriccorona.github.io/s3f/
- Paper: https://arxiv.org/abs/2212.06820
- Code: None
- Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s
**Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos**
- Homepage: https://augmentedperception.github.io/monoavatar/
- Paper: https://arxiv.org/abs/2304.01436
# ReID(重识别)
**Clothing-Change Feature Augmentation for Person Re-Identification**
- Paper: None
- Code: None
**MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID**
- Paper: https://arxiv.org/abs/2303.07065
- Code: https://github.com/vimar-gu/MSINet
**Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification**
- Paper: https://arxiv.org/abs/2304.04205
- Code: None
**Large-scale Training Data Search for Object Re-identification**
- Paper: https://arxiv.org/abs/2303.16186
- Code: https://github.com/yorkeyao/SnP
# Diffusion Models(扩散模型)
**Video Probabilistic Diffusion Models in Projected Latent Space**
- Homepage: https://sihyun.me/PVDM/
- Paper: https://arxiv.org/abs/2302.07685
- Code: https://github.com/sihyun-yu/PVDM
**Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models**
- Paper: https://arxiv.org/abs/2211.10655
- Code: None
**Imagic: Text-Based Real Image Editing with Diffusion Models**
- Homepage: https://imagic-editing.github.io/
- Paper: https://arxiv.org/abs/2210.09276
- Code: None
**Parallel Diffusion Models of Operator and Image for Blind Inverse Problems**
- Paper: https://arxiv.org/abs/2211.10656
- Code: None
**DiffRF: Rendering-guided 3D Radiance Field Diffusion**
- Homepage: https://sirwyver.github.io/DiffRF/
- Paper: https://arxiv.org/abs/2212.01206
- Code: None
**MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**
- Paper: https://arxiv.org/abs/2212.09478
- Code: https://github.com/researchmm/MM-Diffusion
**HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising**
- Homepage: https://aminshabani.github.io/housediffusion/
- Paper: https://arxiv.org/abs/2211.13287
- Code: https://github.com/aminshabani/house_diffusion
**TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets**
- Paper: https://arxiv.org/abs/2303.05762
- Code: https://github.com/chenweixin107/TrojDiff
**Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption**
- Paper: https://arxiv.org/abs/2207.03442
- Code: https://github.com/shiyegao/DDA
**DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration**
- Paper: https://arxiv.org/abs/2303.06885
- Code: None
**Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**
- Homepage: https://nv-tlabs.github.io/trace-pace/
- Paper: https://arxiv.org/abs/2304.01893
- Code: None
**Generative Diffusion Prior for Unified Image Restoration and Enhancement**
- Paper: https://arxiv.org/abs/2304.01247
- Code: None
**Conditional Image-to-Video Generation with Latent Flow Diffusion Models**
- Paper: https://arxiv.org/abs/2303.13744
- Code: https://github.com/nihaomiao/CVPR23_LFDM
# 长尾分布(Long-Tail)
**Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation**
- Paper: https://arxiv.org/abs/2304.01279
- Code: None
# Vision Transformer
**Integrally Pre-Trained Transformer Pyramid Networks**
- Paper: https://arxiv.org/abs/2211.12735
- Code: https://github.com/sunsmarterjie/iTPN
**Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors**
- Homepage: https://niessnerlab.org/projects/hou2023mask3d.html
- Paper: https://arxiv.org/abs/2302.14746
- Code: None
**Learning Trajectory-Aware Transformer for Video Super-Resolution**
- Paper: https://arxiv.org/abs/2204.04216
- Code: https://github.com/researchmm/TTVSR
**Vision Transformers are Parameter-Efficient Audio-Visual Learners**
- Homepage: https://yanbo.ml/project_page/LAVISH/
- Code: https://github.com/GenjiB/LAVISH
**Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**
- Paper: https://arxiv.org/abs/2303.04249
- Code: None
**DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**
- Paper: https://arxiv.org/abs/2301.06051
- Code: https://github.com/Haiyang-W/DSVT
**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**
- Paper: https://arxiv.org/abs/2211.10772
- Code link: https://github.com/ViTAE-Transformer/DeepSolo
**BiFormer: Vision Transformer with Bi-Level Routing Attention**
- Paper: https://arxiv.org/abs/2303.08810
- Code: https://github.com/rayleizhu/BiFormer
**Vision Transformer with Super Token Sampling**
- Paper: https://arxiv.org/abs/2211.11167
- Code: https://github.com/hhb072/SViT
**BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision**
- Paper: https://arxiv.org/abs/2211.10439
- Code: None
**BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation**
- Paper: None
- Code: None
**Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention**
- Paper: https://arxiv.org/abs/2304.03282
- Code: None
**Making Vision Transformers Efficient from A Token Sparsification View**
- Paper: https://arxiv.org/abs/2303.08685
- Code: https://github.com/changsn/STViT-R
# 视觉和语言(Vision-Language)
**GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods**
- Paper: https://arxiv.org/abs/2301.01893
- Code: None
**Teaching Structured Vision&Language Concepts to Vision&Language Models**
- Paper: https://arxiv.org/abs/2211.11733
- Code: None
**Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks**
- Paper: https://arxiv.org/abs/2211.09808
- Code: https://github.com/fundamentalvision/Uni-Perceiver
**Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training**
- Paper: https://arxiv.org/abs/2303.00040
- Code: None
**CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**
- Paper: https://arxiv.org/abs/2303.02489
- Code: None
**FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**
- Paper: https://arxiv.org/abs/2303.02483
- Code: None
**Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding**
- Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html
- Paper: https://arxiv.org/abs/2303.04077
- Code: None
**All in One: Exploring Unified Video-Language Pre-training**
- Paper: https://arxiv.org/abs/2203.07303
- Code: https://github.com/showlab/all-in-one
**Position-guided Text Prompt for Vision Language Pre-training**
- Paper: https://arxiv.org/abs/2212.09737
- Code: https://github.com/sail-sg/ptp
**EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding**
- Paper: https://arxiv.org/abs/2209.14941
- Code: https://github.com/yanmin-wu/EDA
**CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**
- Paper: https://arxiv.org/abs/2303.02489
- Code: None
**FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**
- Paper: https://arxiv.org/abs/2303.02483
- Code: https://github.com/BrandonHanx/FAME-ViL
**Align and Attend: Multimodal Summarization with Dual Contrastive Losses**
- Homepage: https://boheumd.github.io/A2Summ/
- Paper: https://arxiv.org/abs/2303.07284
- Code: https://github.com/boheumd/A2Summ
**Multi-Modal Representation Learning with Text-Driven Soft Masks**
- Paper: https://arxiv.org/abs/2304.00719
- Code: None
**Learning to Name Classes for Vision and Language Models**
- Paper: https://arxiv.org/abs/2304.01830
- Code: None
# 目标检测(Object Detection)
**YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors**
- Paper: https://arxiv.org/abs/2207.02696
- Code: https://github.com/WongKinYiu/yolov7
**DETRs with Hybrid Matching**
- Paper: https://arxiv.org/abs/2207.13080
- Code: https://github.com/HDETR
**Enhanced Training of Query-Based Object Detection via Selective Query Recollection**
- Paper: https://arxiv.org/abs/2212.07593
- Code: https://github.com/Fangyi-Chen/SQR
**Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection**
- Paper: https://arxiv.org/abs/2303.05892
- Code: https://github.com/LutingWang/OADP
# 目标跟踪(Object Tracking)
**Simple Cues Lead to a Strong Multi-Object Tracker**
- Paper: https://arxiv.org/abs/2206.04656
- Code: None
**Joint Visual Grounding and Tracking with Natural Language Specification**
- Paper: https://arxiv.org/abs/2303.12027
- Code: https://github.com/lizhou-cs/JointNLT
# 语义分割(Semantic Segmentation)
**Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos**
- Paper: https://arxiv.org/abs/2303.07224
- Code: https://github.com/THU-LYJ-Lab/AR-Seg
**FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding**
- Paper: https://arxiv.org/abs/2304.02135
- Code: https://github.com/uark-cviu/FREDOM
# 医学图像分割(Medical Image Segmentation)
**Label-Free Liver Tumor Segmentation**
- Paper: https://arxiv.org/abs/2303.14869
- Code: https://github.com/MrGiovanni/SyntheticTumors
**Directional Connectivity-based Segmentation of Medical Images**
- Paper: https://arxiv.org/abs/2304.00145
- Code: https://github.com/Zyun-Y/DconnNet
**Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation**
- Paper: https://arxiv.org/abs/2305.00673
- Code: https://github.com/DeepMed-Lab-ECNU/BCP
**Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization**
- Paper: https://arxiv.org/abs/2304.00212
- Code: None
**Fair Federated Medical Image Segmentation via Client Contribution Estimation**
- Paper: https://arxiv.org/abs/2303.16520
- Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce
**Ambiguous Medical Image Segmentation using Diffusion Models**
- Homepage: https://aimansnigdha.github.io/cimd/
- Paper: https://arxiv.org/abs/2304.04745
- Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models
**Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation**
- Paper: https://arxiv.org/abs/2303.13090
- Code: https://github.com/HengCai-NJU/DeSCO
**MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery**
- Paper: https://arxiv.org/abs/2301.01767
- Code: https://github.com/DeepMed-Lab-ECNU/MagicNet
**MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation**
- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
- Code: https://github.com/WYC-321/MCF
**Rethinking Few-Shot Medical Segmentation: A Vector Quantization View**
- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html
- Code: None
**Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation**
- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
- Code: https://github.com/hritam-98/PatchCL-MedSeg
**SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation**
- Paper: https://arxiv.org/abs/2305.11012
- Code: None
**DoNet: Deep De-overlapping Network for Cytology Instance Segmentation**
- Paper: https://arxiv.org/abs/2303.14373
- Code: https://github.com/DeepDoNet/DoNet
# 视频目标分割(Video Object Segmentation)
**Two-shot Video Object Segmentation**
- Paper: https://arxiv.org/abs/2303.12078
- Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation
**Under Video Object Segmentation Section**
- Paper: https://arxiv.org/abs/2303.07815
- Code: None
# 视频实例分割(Video Instance Segmentation)
**Mask-Free Video Instance Segmentation**
- Paper: https://arxiv.org/abs/2303.15904
- Code: https://github.com/SysCV/MaskFreeVis
# 参考图像分割(Referring Image Segmentation )
**PolyFormer: Referring Image Segmentation as Sequential Polygon Generation**
- Paper: https://arxiv.org/abs/2302.07387
- Code: None
# 3D点云(3D-Point-Cloud)
**Physical-World Optical Adversarial Attacks on 3D Face Recognition**
- Paper: https://arxiv.org/abs/2205.13412
- Code: https://github.com/PolyLiYJ/SLAttack.git
**IterativePFN: True Iterative Point Cloud Filtering**
- Paper: https://arxiv.org/abs/2304.01529
- Code: https://github.com/ddsediri/IterativePFN
**Attention-based Point Cloud Edge Sampling**
- Homepage: https://junweizheng93.github.io/publications/APES/APES.html
- Paper: https://arxiv.org/abs/2302.14673
- Code: https://github.com/JunweiZheng93/APES
# 3D目标检测(3D Object Detection)
**DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**
- Paper: https://arxiv.org/abs/2301.06051
- Code: https://github.com/Haiyang-W/DSVT
**FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection**
- Paper: https://arxiv.org/abs/2301.04467
- Code: None
**3D Video Object Detection with Learnable Object-Centric Global Optimization**
- Paper: None
- Code: None
**Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection**
- Paper: https://arxiv.org/abs/2304.01464
- Code: https://github.com/azhuantou/HSSDA
# 3D语义分割(3D Semantic Segmentation)
**Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation**
- Paper: https://arxiv.org/abs/2303.11203
- Code: https://github.com/l1997i/lim3d
# 3D语义场景补全(3D Semantic Scene Completion)
- Paper: https://arxiv.org/abs/2302.12251
- Code: https://github.com/NVlabs/VoxFormer
# 3D配准(3D Registration)
**Robust Outlier Rejection for 3D Registration with Variational Bayes**
- Paper: https://arxiv.org/abs/2304.01514
- Code: https://github.com/Jiang-HB/VBReg
# 3D人体姿态估计(3D Human Pose Estimation)
# 3D人体Mesh估计(3D Human Mesh Estimation)
**3D Human Mesh Estimation from Virtual Markers**
- Paper: https://arxiv.org/abs/2303.11726
- Code: https://github.com/ShirleyMaxx/VirtualMarker
# Low-level Vision
**Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective**
- Paper: https://arxiv.org/abs/2303.06859
- Code: https://github.com/lixinustc/Casual-IR-DIL
**Burstormer: Burst Image Restoration and Enhancement Transformer**
- Paper: https://arxiv.org/abs/2304.01194
- Code: http://github.com/akshaydudhane16/Burstormer
# 超分辨率(Video Super-Resolution)
**Super-Resolution Neural Operator**
- Paper: https://arxiv.org/abs/2303.02584
- Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator
## 视频超分辨率
**Learning Trajectory-Aware Transformer for Video Super-Resolution**
- Paper: https://arxiv.org/abs/2204.04216
- Code: https://github.com/researchmm/TTVSR
Denoising
# 去噪(Denoising)
## 图像去噪(Image Denoising)
**Masked Image Training for Generalizable Deep Image Denoising**
- Paper- : https://arxiv.org/abs/2303.13132
- Code: https://github.com/haoyuc/MaskedDenoising
# 图像生成(Image Generation)
**GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**
- Paper: https://arxiv.org/abs/2301.12959
- Code: https://github.com/tobran/GALIP
**MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis**
- Paper: https://arxiv.org/abs/2211.09117
- Code: https://github.com/LTH14/mage
**Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation**
- Paper: https://arxiv.org/abs/2304.01816
- Code: None
**Few-shot Semantic Image Synthesis with Class Affinity Transfer**
- Paper: https://arxiv.org/abs/2304.02321
- Code: None
**TopNet: Transformer-based Object Placement Network for Image Compositing**
- Paper: https://arxiv.org/abs/2304.03372
- Code: None
# 视频生成(Video Generation)
**MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**
- Paper: https://arxiv.org/abs/2212.09478
- Code: https://github.com/researchmm/MM-Diffusion
**Conditional Image-to-Video Generation with Latent Flow Diffusion Models**
- Paper: https://arxiv.org/abs/2303.13744
- Code: https://github.com/nihaomiao/CVPR23_LFDM
# 视频理解(Video Understanding)
**Learning Transferable Spatiotemporal Representations from Natural Script Knowledge**
- Paper: https://arxiv.org/abs/2209.15280
- Code: https://github.com/TencentARC/TVTS
**Frame Flexible Network**
- Paper: https://arxiv.org/abs/2303.14817
- Code: https://github.com/BeSpontaneous/FFN
**Masked Motion Encoding for Self-Supervised Video Representation Learning**
- Paper: https://arxiv.org/abs/2210.06096
- Code: https://github.com/XinyuSun/MME
**MARLIN: Masked Autoencoder for facial video Representation LearnING**
- Paper: https://arxiv.org/abs/2211.06627
- Code: https://github.com/ControlNet/MARLIN
# 行为检测(Action Detection)
**TriDet: Temporal Action Detection with Relative Boundary Modeling**
- Paper: https://arxiv.org/abs/2303.07347
- Code: https://github.com/dingfengshi/TriDet
# 文本检测(Text Detection)
**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**
- Paper: https://arxiv.org/abs/2211.10772
- Code link: https://github.com/ViTAE-Transformer/DeepSolo
# 知识蒸馏(Knowledge Distillation)
**Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation**
- Paper: https://arxiv.org/abs/2302.14290
- Code: None
**Generic-to-Specific Distillation of Masked Autoencoders**
- Paper: https://arxiv.org/abs/2302.14771
- Code: https://github.com/pengzhiliang/G2SD
# 模型剪枝(Model Pruning)
**DepGraph: Towards Any Structural Pruning**
- Paper: https://arxiv.org/abs/2301.12900
- Code: https://github.com/VainF/Torch-Pruning
# 图像压缩(Image Compression)
**Context-Based Trit-Plane Coding for Progressive Image Compression**
- Paper: https://arxiv.org/abs/2303.05715
- Code: https://github.com/seungminjeon-github/CTC
# 异常检测(Anomaly Detection)
**Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images**
- Paper: https://arxiv.org/abs/2111.13495
- Code: https://github.com/tiangexiang/SQUID
# 三维重建(3D Reconstruction)
**OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields**
- Paper: https://arxiv.org/abs/2211.12886
- Code: None
**SparsePose: Sparse-View Camera Pose Regression and Refinement**
- Paper: https://arxiv.org/abs/2211.16991
- Code: None
**NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction**
- Paper: https://arxiv.org/abs/2303.02375
- Code: None
**Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition**
- Homepage: https://moygcc.github.io/vid2avatar/
- Paper: https://arxiv.org/abs/2302.11566
- Code: https://github.com/MoyGcc/vid2avatar
- Demo: https://youtu.be/EGi47YeIeGQ
**To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**
- Paper: https://arxiv.org/abs/2106.09614
- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
**Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction**
- Paper: https://arxiv.org/abs/2303.05937
- Code: None
**3D Cinemagraphy from a Single Image**
- Homepage: https://xingyi-li.github.io/3d-cinemagraphy/
- Paper: https://arxiv.org/abs/2303.05724
- Code: https://github.com/xingyi-li/3d-cinemagraphy
**Revisiting Rotation Averaging: Uncertainties and Robust Losses**
- Paper: https://arxiv.org/abs/2303.05195
- Code https://github.com/zhangganlin/GlobalSfMpy
**FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction**
- Paper: https://arxiv.org/abs/2211.13874
- Code: https://github.com/csbhr/FFHQ-UV
**A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images**
- Homepage: https://younglbw.github.io/HRN-homepage/
- Paper: https://arxiv.org/abs/2302.14434
- Code: https://github.com/youngLBW/HRN
# 深度估计(Depth Estimation)
**Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation**
- Paper: https://arxiv.org/abs/2211.13202
- Code: https://github.com/noahzn/Lite-Mono
# 轨迹预测(Trajectory Prediction)
**IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction**
- Paper: https://arxiv.org/abs/2303.00575
- Code: None
**EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning**
- Paper: https://arxiv.org/abs/2303.10876
- Code: https://github.com/MediaBrain-SJTU/EqMotion
# 车道线检测(Lane Detection)
**Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection**
- Paper: https://arxiv.org/abs/2301.02371
- Code: https://github.com/tusen-ai/Anchor3DLane
**BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points**
- Paper: https://arxiv.org/abs/2210.06006v3
- Code: https://github.com/gigo-team/bev_lane_det
# 图像描述(Image Captioning)
**ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing**
- Paper: https://arxiv.org/abs/2303.02437
- Code: Node
**Cross-Domain Image Captioning with Discriminative Finetuning**
- Paper: https://arxiv.org/abs/2304.01662
- Code: None
**Model-Agnostic Gender Debiased Image Captioning**
- Paper: https://arxiv.org/abs/2304.03693
- Code: None
# 视觉问答(Visual Question Answering)
**MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering**
- Paper: https://arxiv.org/abs/2303.01239
- Code: https://github.com/jingjing12110/MixPHM
# 手语识别(Sign Language Recognition)
**Continuous Sign Language Recognition with Correlation Network**
Paper: https://arxiv.org/abs/2303.03202
Code: https://github.com/hulianyuyy/CorrNet
# 视频预测(Video Prediction)
**MOSO: Decomposing MOtion, Scene and Object for Video Prediction**
- Paper: https://arxiv.org/abs/2303.03684
- Code: https://github.com/anonymous202203/MOSO
# 新视点合成(Novel View Synthesis)
**3D Video Loops from Asynchronous Input**
- Homepage: https://limacv.github.io/VideoLoop3D_web/
- Paper: https://arxiv.org/abs/2303.05312
- Code: https://github.com/limacv/VideoLoop3D
# Zero-Shot Learning(零样本学习)
**Bi-directional Distribution Alignment for Transductive Zero-Shot Learning**
- Paper: https://arxiv.org/abs/2303.08698
- Code: https://github.com/Zhicaiwww/Bi-VAEGAN
**Semantic Prompt for Few-Shot Learning**
- Paper: None
- Code: None
# 立体匹配(Stereo Matching)
**Iterative Geometry Encoding Volume for Stereo Matching**
- Paper: https://arxiv.org/abs/2303.06615
- Code: https://github.com/gangweiX/IGEV
**Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation**
- Paper: https://arxiv.org/abs/2304.00152
- Code: None
# 特征匹配(Feature Matching)
**Adaptive Spot-Guided Transformer for Consistent Local Feature Matching**
- Homepage: [https://astr2023.github.io](https://astr2023.github.io/)
- Paper: https://arxiv.org/abs/2303.16624
- Code: https://github.com/ASTR2023/ASTR
# 场景图生成(Scene Graph Generation)
**Prototype-based Embedding Network for Scene Graph Generation**
- Paper: https://arxiv.org/abs/2303.07096
- Code: None
# 隐式神经表示(Implicit Neural Representations)
**Polynomial Implicit Neural Representations For Large Diverse Datasets**
- Paper: https://arxiv.org/abs/2303.11424
- Code: https://github.com/Rajhans0/Poly_INR
# 图像质量评价(Image Quality Assessment)
**Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild**
- Paper: https://arxiv.org/abs/2304.00451
- Code: None
# 数据集(Datasets)
**Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes**
- Paper: https://arxiv.org/abs/2303.02760
- Code: None
**Align and Attend: Multimodal Summarization with Dual Contrastive Losses**
- Homepage: https://boheumd.github.io/A2Summ/
- Paper: https://arxiv.org/abs/2303.07284
- Code: https://github.com/boheumd/A2Summ
**GeoNet: Benchmarking Unsupervised Adaptation across Geographies**
- Homepage: https://tarun005.github.io/GeoNet/
- Paper: https://arxiv.org/abs/2303.15443
**CelebV-Text: A Large-Scale Facial Text-Video Dataset**
- Homepage: https://celebv-text.github.io/
- Paper: https://arxiv.org/abs/2303.14717
# 其他(Others)
**Interactive Segmentation as Gaussian Process Classification**
- Paper: https://arxiv.org/abs/2302.14578
- Code: None
**Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger**
- Paper: https://arxiv.org/abs/2302.14677
- Code: None
**SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries**
- Homepage: http://bit.ly/splinecam
- Paper: https://arxiv.org/abs/2302.12828
- Code: None
**SCOTCH and SODA: A Transformer Video Shadow Detection Framework**
- Paper: https://arxiv.org/abs/2211.06885
- Code: None
**DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization**
- Homepage: https://ai4ce.github.io/DeepMapping2/
- Paper: https://arxiv.org/abs/2212.06331
- None: https://github.com/ai4ce/DeepMapping2
**RelightableHands: Efficient Neural Relighting of Articulated Hand Models**
- Homepage: https://sh8.io/#/relightable_hands
- Paper: https://arxiv.org/abs/2302.04866
- Code: None
**Token Turing Machines**
- Paper: https://arxiv.org/abs/2211.09119
- Code: None
**Single Image Backdoor Inversion via Robust Smoothed Classifiers**
- Paper: https://arxiv.org/abs/2303.00215
- Code: https://github.com/locuslab/smoothinv
**To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**
- Paper: https://arxiv.org/abs/2106.09614
- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
**HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics**
- Homepage: https://dolorousrtur.github.io/hood/
- Paper: https://arxiv.org/abs/2212.07242
- Code: https://github.com/dolorousrtur/hood
- Demo: https://www.youtube.com/watch?v=cBttMDPrUYY
**A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others**
- Paper: https://arxiv.org/abs/2212.04825
- Code: https://github.com/facebookresearch/Whac-A-Mole.git
**RelightableHands: Efficient Neural Relighting of Articulated Hand Models**
- Homepage: https://sh8.io/#/relightable_hands
- Paper: https://arxiv.org/abs/2302.04866
- Code: None
- Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4
**Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation**
- Paper: https://arxiv.org/abs/2303.00914
- Code: None
**Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression**
- Paper: https://arxiv.org/abs/2303.01052
- Code: None
**UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy**
- Paper: https://arxiv.org/abs/2303.00938
- Code: None
**Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness**
- Paper: https://arxiv.org/abs/2303.00971
- Code: https://github.com/zhijieshen-bjtu/DOPNet
**Learning Neural Parametric Head Models**
- Homepage: https://simongiebenhain.github.io/NPHM)
- Paper: https://arxiv.org/abs/2212.02761
- Code: None
**A Meta-Learning Approach to Predicting Performance and Data Requirements**
- Paper: https://arxiv.org/abs/2303.01598
- Code: None
**MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision**
- Homepage: https://imagine.enpc.fr/~guedona/MACARONS/
- Paper: https://arxiv.org/abs/2303.03315
- Code: None
**Masked Images Are Counterfactual Samples for Robust Fine-tuning**
- Paper: https://arxiv.org/abs/2303.03052
- Code: None
**HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling**
- Paper: https://arxiv.org/abs/2303.02700
- Code: None
**Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization**
- Paper: https://arxiv.org/abs/2303.02328
- Code: None
**Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization**
- Paper: https://arxiv.org/abs/2303.03108
- Code: None
**Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples**
- Paper: https://arxiv.org/abs/2301.01217
- Code: https://github.com/jiamingzhang94/Unlearnable-Clusters
**Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**
- Paper: https://arxiv.org/abs/2303.04249
- Code: None
**UniHCP: A Unified Model for Human-Centric Perceptions**
- Paper: https://arxiv.org/abs/2303.02936
- Code: https://github.com/OpenGVLab/UniHCP
**CUDA: Convolution-based Unlearnable Datasets**
- Paper: https://arxiv.org/abs/2303.04278
- Code: https://github.com/vinusankars/Convolution-based-Unlearnability
**Masked Images Are Counterfactual Samples for Robust Fine-tuning**
- Paper: https://arxiv.org/abs/2303.03052
- Code: None
**AdaptiveMix: Robust Feature Representation via Shrinking Feature Space**
- Paper: https://arxiv.org/abs/2303.01559
- Code: https://github.com/WentianZhang-ML/AdaptiveMix
**Physical-World Optical Adversarial Attacks on 3D Face Recognition**
- Paper: https://arxiv.org/abs/2205.13412
- Code: https://github.com/PolyLiYJ/SLAttack.git
**DPE: Disentanglement of Pose and Expression for General Video Portrait Editing**
- Paper: https://arxiv.org/abs/2301.06281
- Code: https://carlyx.github.io/DPE/
**SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation**
- Paper: https://arxiv.org/abs/2211.12194
- Code: https://github.com/Winfredy/SadTalker
**Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models**
- Paper: None
- Code: None
**Sharpness-Aware Gradient Matching for Domain Generalization**
- Paper: None
- Code: https://github.com/Wang-pengfei/SAGM
**Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization**
- Paper: None
- Code: None
**Blind Video Deflickering by Neural Filtering with a Flawed Atlas**
- Homepage: https://chenyanglei.github.io/deflicker
- Paper: None
- Code: None
**RiDDLE: Reversible and Diversified De-identification with Latent Encryptor**
- Paper: None
- Code: https://github.com/ldz666666/RiDDLE
**PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation**
- Paper: https://arxiv.org/abs/2303.07337
- Code: None
**Upcycling Models under Domain and Category Shift**
- Paper: https://arxiv.org/abs/2303.07110
- Code: https://github.com/ispc-lab/GLC
**Modality-Agnostic Debiasing for Single Domain Generalization**
- Paper: https://arxiv.org/abs/2303.07123
- Code: None
**Progressive Open Space Expansion for Open-Set Model Attribution**
- Paper: https://arxiv.org/abs/2303.06877
- Code: None
**Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies**
- Paper: https://arxiv.org/abs/2303.06856
- Code: None
**GFPose: Learning 3D Human Pose Prior with Gradient Fields**
- Paper: https://arxiv.org/abs/2212.08641
- Code: https://github.com/Embracing/GFPose
**PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment**
- Paper: https://arxiv.org/abs/2303.11526
- Code: https://github.com/Zhang-VISLab
**Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings**
- Paper: https://arxiv.org/abs/2303.11502
- Code: None
**Boundary Unlearning**
- Paper: https://arxiv.org/abs/2303.11570
- Code: None
**ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing**
- Paper: https://arxiv.org/abs/2303.17096
- Code: https://github.com/alibaba/easyrobust
**Zero-shot Model Diagnosis**
- Paper: https://arxiv.org/abs/2303.15441
- Code: None
**GeoNet: Benchmarking Unsupervised Adaptation across Geographies**
- Homepage: https://tarun005.github.io/GeoNet/
- Paper: https://arxiv.org/abs/2303.15443
**Quantum Multi-Model Fitting**
- Paper: https://arxiv.org/abs/2303.15444
- Code: https://github.com/FarinaMatteo/qmmf
**DivClust: Controlling Diversity in Deep Clustering**
- Paper: https://arxiv.org/abs/2304.01042
- Code: None
**Neural Volumetric Memory for Visual Locomotion Control**
- Homepage: https://rchalyang.github.io/NVM
- Paper: https://arxiv.org/abs/2304.01201
- Code: https://rchalyang.github.io/NVM
**MonoHuman: Animatable Human Neural Field from Monocular Video**
- Homepage: https://yzmblog.github.io/projects/MonoHuman/
- Paper: https://arxiv.org/abs/2304.02001
- Code: https://github.com/Yzmblog/MonoHuman
**Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**
- Homepage: https://nv-tlabs.github.io/trace-pace/
- Paper: https://arxiv.org/abs/2304.01893
- Code: None
**Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification**
- Paper: https://arxiv.org/abs/2304.01804
- Code: None
**HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering**
- Paper: https://arxiv.org/abs/2304.01686
- Code: None
**On the Stability-Plasticity Dilemma of Class-Incremental Learning**
- Paper: https://arxiv.org/abs/2304.01663
- Code: None
**Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning**
- Paper: https://arxiv.org/abs/2304.01482
- Code: None
**VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution**
- Paper: https://arxiv.org/abs/2304.01434
- Code: https://github.com/jaeill/CVPR23-VNE
**Detecting and Grounding Multi-Modal Media Manipulation**
- Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake
- Paper: https://arxiv.org/abs/2304.02556
- Code: https://github.com/rshaojimmy/MultiModal-DeepFake
**Meta-causal Learning for Single Domain Generalization**
- Paper: https://arxiv.org/abs/2304.03709
- Code: None
**Disentangling Writer and Character Styles for Handwriting Generation**
- Paper: https://arxiv.org/abs/2303.14736
- Code: https://github.com/dailenson/SDT
**DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects**
- Homepage: https://www.chenbao.tech/dexart/
- Code: https://github.com/Kami-code/dexart-release
**Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision**
- Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html
- Paper: https://arxiv.org/abs/2303.00462
- Code: https://github.com/Toytiny/CMFlow
**Marching-Primitives: Shape Abstraction from Signed Distance Function**
- Paper: https://arxiv.org/abs/2303.13190
- Code: https://github.com/ChirikjianLab/Marching-Primitives
**Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision**
- Paper: https://arxiv.org/abs/2303.00885
- Code: None
================================================
FILE: CVPR2024-Papers-with-Code.md
================================================
# CVPR 2024 论文和开源项目合集(Papers with Code)
CVPR 2024 decisions are now available on OpenReview!
> 注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
>
> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)
> - [CVPR 2023](CVPR2022-Papers-with-Code.md)
欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!

# 【CVPR 2024 论文开源目录】
- [3DGS(Gaussian Splatting)](#3DGS)
- [Avatars](#Avatars)
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [MAE](#MAE)
- [Embodied AI](#Embodied-AI)
- [GAN](#GAN)
- [GNN](#GNN)
- [多模态大语言模型(MLLM)](#MLLM)
- [大语言模型(LLM)](#LLM)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [Prompt](#Prompt)
- [扩散模型(Diffusion Models)](#Diffusion)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像(Medical Image)](#MI)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [3D生成(3D Generation)](#3D-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [场景图生成(Scene Graph Generation)](#SGG)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)
# 3DGS(Gaussian Splatting)
**Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering**
- Homepage: https://city-super.github.io/scaffold-gs/
- Paper: https://arxiv.org/abs/2312.00109
- Code: https://github.com/city-super/Scaffold-GS
**GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis**
- Homepage: https://shunyuanzheng.github.io/GPS-Gaussian
- Paper: https://arxiv.org/abs/2312.02155
- Code: https://github.com/ShunyuanZheng/GPS-Gaussian
**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**
- Paper: https://arxiv.org/abs/2312.02134
- Code: https://github.com/huliangxiao/GaussianAvatar
**GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting**
- Paper: https://arxiv.org/abs/2311.14521
- Code: https://github.com/buaacyw/GaussianEditor
**Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction**
- Homepage: https://ingra14m.github.io/Deformable-Gaussians/
- Paper: https://arxiv.org/abs/2309.13101
- Code: https://github.com/ingra14m/Deformable-3D-Gaussians
**SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes**
- Homepage: https://yihua7.github.io/SC-GS-web/
- Paper: https://arxiv.org/abs/2312.14937
- Code: https://github.com/yihua7/SC-GS
**Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis**
- Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/
- Paper: https://arxiv.org/abs/2312.16812
- Code: https://github.com/oppo-us-research/SpacetimeGaussians
**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**
- Homepage: https://fictionarry.github.io/DNGaussian/
- Paper: https://arxiv.org/abs/2403.06912
- Code: https://github.com/Fictionarry/DNGaussian
**4D Gaussian Splatting for Real-Time Dynamic Scene Rendering**
- Paper: https://arxiv.org/abs/2310.08528
- Code: https://github.com/hustvl/4DGaussians
**GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models**
- Paper: https://arxiv.org/abs/2310.08529
- Code: https://github.com/hustvl/GaussianDreamer
# Avatars
**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**
- Paper: https://arxiv.org/abs/2312.02134
- Code: https://github.com/huliangxiao/GaussianAvatar
**Real-Time Simulated Avatar from Head-Mounted Sensors**
- Homepage: https://www.zhengyiluo.com/SimXR/
- Paper: https://arxiv.org/abs/2403.06862
# Backbone
**RepViT: Revisiting Mobile CNN From ViT Perspective**
- Paper: https://arxiv.org/abs/2307.09283
- Code: https://github.com/THU-MIG/RepViT
**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**
- Paper: https://arxiv.org/abs/2311.17132
- Code: https://github.com/DaiShiResearch/TransNeXt
# CLIP
**Alpha-CLIP: A CLIP Model Focusing on Wherever You Want**
- Paper: https://arxiv.org/abs/2312.03818
- Code: https://github.com/SunzeY/AlphaCLIP
**FairCLIP: Harnessing Fairness in Vision-Language Learning**
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
# MAE
# Embodied AI
**EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI**
- Homepage: https://tai-wang.github.io/embodiedscan/
- Paper: https://arxiv.org/abs/2312.16170
- Code: https://github.com/OpenRobotLab/EmbodiedScan
**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**
- Homepage: https://iranqin.github.io/MP5.github.io/
- Paper: https://arxiv.org/abs/2312.07472
- Code: https://github.com/IranQin/MP5
**LEMON: Learning 3D Human-Object Interaction Relation from 2D Images**
- Paper: https://arxiv.org/abs/2312.08963
- Code: https://github.com/yyvhang/lemon_3d
# GAN
# OCR
**An Empirical Study of Scaling Law for OCR**
- Paper: https://arxiv.org/abs/2401.00028
- Code: https://github.com/large-ocr-model/large-ocr-model.github.io
**ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting**
- Paper: https://arxiv.org/abs/2403.00303
- Code: https://github.com/PriNing/ODM
# NeRF
**PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF**
- Paper: https://arxiv.org/abs/2311.13099
- Code: https://github.com/FYTalon/pienerf/
# DETR
**DETRs Beat YOLOs on Real-time Object Detection**
- Paper: https://arxiv.org/abs/2304.08069
- Code: https://github.com/lyuwenyu/RT-DETR
**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**
- Paper: https://arxiv.org/abs/2403.16131
- Code: https://github.com/xiuqhou/Salience-DETR
# Prompt
# 多模态大语言模型(MLLM)
**mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration**
- Paper: https://arxiv.org/abs/2311.04257
- Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2
**Link-Context Learning for Multimodal LLMs**
- Paper: https://arxiv.org/abs/2308.07891
- Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main
**OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation**
- Paper: https://arxiv.org/abs/2311.17911
- Code: https://github.com/shikiw/OPERA
**Making Large Multimodal Models Understand Arbitrary Visual Prompts**
- Homepage: https://vip-llava.github.io/
- Paper: https://arxiv.org/abs/2312.00784
**Pink: Unveiling the power of referential comprehension for multi-modal llms**
- Paper: https://arxiv.org/abs/2310.00582
- Code: https://github.com/SY-Xuan/Pink
**Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding**
- Paper: https://arxiv.org/abs/2311.08046
- Code: https://github.com/PKU-YuanGroup/Chat-UniVi
**OneLLM: One Framework to Align All Modalities with Language**
- Paper: https://arxiv.org/abs/2312.03700
- Code: https://github.com/csuhan/OneLLM
# 大语言模型(LLM)
**VTimeLLM: Empower LLM to Grasp Video Moments**
- Paper: https://arxiv.org/abs/2311.18445
- Code: https://github.com/huangb23/VTimeLLM
# NAS
# ReID(重识别)
**Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification**
- Paper: https://arxiv.org/abs/2403.10254
- Code: https://github.com/924973292/EDITOR
**Noisy-Correspondence Learning for Text-to-Image Person Re-identification**
- Paper: https://arxiv.org/abs/2308.09911
- Code : https://github.com/QinYang79/RDE
# 扩散模型(Diffusion Models)
**InstanceDiffusion: Instance-level Control for Image Generation**
- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
- Paper: https://arxiv.org/abs/2402.03290
- Code: https://github.com/frank-xwang/InstanceDiffusion
**Residual Denoising Diffusion Models**
- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM
**DeepCache: Accelerating Diffusion Models for Free**
- Paper: https://arxiv.org/abs/2312.00858
- Code: https://github.com/horseee/DeepCache
**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**
- Homepage: https://tianhao-qi.github.io/DEADiff/
- Paper: https://arxiv.org/abs/2403.06951
- Code: https://github.com/Tianhao-Qi/DEADiff_code
**SVGDreamer: Text Guided SVG Generation with Diffusion Model**
- Paper: https://arxiv.org/abs/2312.16476
- Code: https://ximinng.github.io/SVGDreamer-project/
**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**
- Paper: https://arxiv.org/abs/2312.05849
- Code: https://github.com/jiuntian/interactdiffusion
**MMA-Diffusion: MultiModal Attack on Diffusion Models**
- Paper: https://arxiv.org/abs/2311.17516
- Code: https://github.com/yangyijune/MMA-Diffusion
**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**
- Homeoage: https://video-motion-customization.github.io/
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
# Vision Transformer
**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**
- Paper: https://arxiv.org/abs/2311.17132
- Code: https://github.com/DaiShiResearch/TransNeXt
**RepViT: Revisiting Mobile CNN From ViT Perspective**
- Paper: https://arxiv.org/abs/2307.09283
- Code: https://github.com/THU-MIG/RepViT
**A General and Efficient Training for Transformer via Token Expansion**
- Paper: https://arxiv.org/abs/2404.00672
- Code: https://github.com/Osilly/TokenExpansion
# 视觉和语言(Vision-Language)
**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**
- Paper: https://arxiv.org/abs/2403.02781
- Code: https://github.com/zhengli97/PromptKD
**FairCLIP: Harnessing Fairness in Vision-Language Learning**
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
# 目标检测(Object Detection)
**DETRs Beat YOLOs on Real-time Object Detection**
- Paper: https://arxiv.org/abs/2304.08069
- Code: https://github.com/lyuwenyu/RT-DETR
**Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation**
- Paper: https://arxiv.org/abs/2312.01220
- Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation
**YOLO-World: Real-Time Open-Vocabulary Object Detection**
- Paper: https://arxiv.org/abs/2401.17270
- Code: https://github.com/AILab-CVC/YOLO-World
**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**
- Paper: https://arxiv.org/abs/2403.16131
- Code: https://github.com/xiuqhou/Salience-DETR
# 异常检测(Anomaly Detection)
**Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection**
- Paper: https://arxiv.org/abs/2310.12790
- Code: https://github.com/mala-lab/AHL
# 目标跟踪(Object Tracking)
**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**
- Paper: https://arxiv.org/abs/2403.04700
- Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT
# 语义分割(Semantic Segmentation)
**Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation**
- Paper: https://arxiv.org/abs/2312.04265
- Code: https://github.com/w1oves/Rein
**SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation**
- Paper: https://arxiv.org/abs/2311.15537
- Code: https://github.com/xb534/SED
# 医学图像(Medical Image)
**Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology**
- Paper: https://arxiv.org/abs/2402.17228
- Code: https://github.com/DearCaat/RRT-MIL
**VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis**
- Paper: https://arxiv.org/abs/2402.17300
- Code: https://github.com/Luffy03/VoCo
**ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images**
- Paper: https://arxiv.org/abs/2311.15264
- Code: https://github.com/nicoboou/chada_vit
# 医学图像分割(Medical Image Segmentation)
# 自动驾驶(Autonomous Driving)
**UniPAD: A Universal Pre-training Paradigm for Autonomous Driving**
- Paper: https://arxiv.org/abs/2310.08370
- Code: https://github.com/Nightmare-n/UniPAD
**Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications**
- Paper: https://arxiv.org/abs/2311.17663
- Code: https://github.com/haomo-ai/Cam4DOcc
**Memory-based Adapters for Online 3D Scene Perception**
- Paper: https://arxiv.org/abs/2403.06974
- Code: https://github.com/xuxw98/Online3D
**Symphonize 3D Semantic Scene Completion with Contextual Instance Queries**
- Paper: https://arxiv.org/abs/2306.15670
- Code: https://github.com/hustvl/Symphonies
**A Real-world Large-scale Dataset for Roadside Cooperative Perception**
- Paper: https://arxiv.org/abs/2403.10145
- Code: https://github.com/AIR-THU/DAIR-RCooper
**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**
- Paper: https://arxiv.org/abs/2403.07535
- Code: https://github.com/Junda24/AFNet
**Traffic Scene Parsing through the TSP6K Dataset**
- Paper: https://arxiv.org/pdf/2303.02835.pdf
- Code: https://github.com/PengtaoJiang/TSP6K
# 3D点云(3D-Point-Cloud)
# 3D目标检测(3D Object Detection)
**PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection**
- Paper: https://arxiv.org/abs/2312.08371
- Code: https://github.com/kuanchihhuang/PTT
**UniMODE: Unified Monocular 3D Object Detection**
- Paper: https://arxiv.org/abs/2402.18573
# 3D语义分割(3D Semantic Segmentation)
# 图像编辑(Image Editing)
**Edit One for All: Interactive Batch Image Editing**
- Homepage: https://thaoshibe.github.io/edit-one-for-all
- Paper: https://arxiv.org/abs/2401.10219
- Code: https://github.com/thaoshibe/edit-one-for-all
# 视频编辑(Video Editing)
**MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers**
- Homepage: [https://maskint.github.io](https://maskint.github.io/)
- Paper: https://arxiv.org/abs/2312.12468
# Low-level Vision
**Residual Denoising Diffusion Models**
- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM
**Boosting Image Restoration via Priors from Pre-trained Models**
- Paper: https://arxiv.org/abs/2403.06793
# 超分辨率(Super-Resolution)
**SeD: Semantic-Aware Discriminator for Image Super-Resolution**
- Paper: https://arxiv.org/abs/2402.19387
- Code: https://github.com/lbc12345/SeD
**APISR: Anime Production Inspired Real-World Anime Super-Resolution**
- Paper: https://arxiv.org/abs/2403.01598
- Code: https://github.com/Kiteretsu77/APISR
# 去噪(Denoising)
## 图像去噪(Image Denoising)
# 3D人体姿态估计(3D Human Pose Estimation)
**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**
- Paper: https://arxiv.org/abs/2311.12028
- Code: https://github.com/NationalGAILab/HoT
# 图像生成(Image Generation)
**InstanceDiffusion: Instance-level Control for Image Generation**
- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
- Paper: https://arxiv.org/abs/2402.03290
- Code: https://github.com/frank-xwang/InstanceDiffusion
**ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations**
- Homepage: https://eclipse-t2i.vercel.app/
- Paper: https://arxiv.org/abs/2312.04655
- Code: https://github.com/eclipse-t2i/eclipse-inference
**Instruct-Imagen: Image Generation with Multi-modal Instruction**
- Paper: https://arxiv.org/abs/2401.01952
**Residual Denoising Diffusion Models**
- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM
**UniGS: Unified Representation for Image Generation and Segmentation**
- Paper: https://arxiv.org/abs/2312.01985
**Multi-Instance Generation Controller for Text-to-Image Synthesis**
- Paper: https://arxiv.org/abs/2402.05408
- Code: https://github.com/limuloo/migc
**SVGDreamer: Text Guided SVG Generation with Diffusion Model**
- Paper: https://arxiv.org/abs/2312.16476
- Code: https://ximinng.github.io/SVGDreamer-project/
**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**
- Paper: https://arxiv.org/abs/2312.05849
- Code: https://github.com/jiuntian/interactdiffusion
**Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following**
- Paper: https://arxiv.org/abs/2311.17002
- Code: https://github.com/ali-vilab/Ranni
# 视频生成(Video Generation)
**Vlogger: Make Your Dream A Vlog**
- Paper: https://arxiv.org/abs/2401.09414
- Code: https://github.com/Vchitect/Vlogger
**VBench: Comprehensive Benchmark Suite for Video Generative Models**
- Homepage: https://vchitect.github.io/VBench-project/
- Paper: https://arxiv.org/abs/2311.17982
- Code: https://github.com/Vchitect/VBench
**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**
- Homeoage: https://video-motion-customization.github.io/
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
# 3D生成
**CityDreamer: Compositional Generative Model of Unbounded 3D Cities**
- Homepage: https://haozhexie.com/project/city-dreamer/
- Paper: https://arxiv.org/abs/2309.00610
- Code: https://github.com/hzxie/city-dreamer
**LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching**
- Paper: https://arxiv.org/abs/2311.11284
- Code: https://github.com/EnVision-Research/LucidDreamer
# 视频理解(Video Understanding)
**MVBench: A Comprehensive Multi-modal Video Understanding Benchmark**
- Paper: https://arxiv.org/abs/2311.17005
- Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2
# 知识蒸馏(Knowledge Distillation)
**Logit Standardization in Knowledge Distillation**
- Paper: https://arxiv.org/abs/2403.01427
- Code: https://github.com/sunshangquan/logit-standardization-KD
**Efficient Dataset Distillation via Minimax Diffusion**
- Paper: https://arxiv.org/abs/2311.15529
- Code: https://github.com/vimar-gu/MinimaxDiffusion
# 立体匹配(Stereo Matching)
**Neural Markov Random Field for Stereo Matching**
- Paper: https://arxiv.org/abs/2403.11193
- Code: https://github.com/aeolusguan/NMRF
# 场景图生成(Scene Graph Generation)
**HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation**
- Homepage: https://zhangce01.github.io/HiKER-SGG/
- Paper : https://arxiv.org/abs/2403.12033
- Code: https://github.com/zhangce01/HiKER-SGG
# 视频质量评价(Video Quality Assessment)
**KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos**
- Homepage: https://lixinustc.github.io/projects/KVQ/
- Paper: https://arxiv.org/abs/2402.07220
- Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024
# 数据集(Datasets)
**A Real-world Large-scale Dataset for Roadside Cooperative Perception**
- Paper: https://arxiv.org/abs/2403.10145
- Code: https://github.com/AIR-THU/DAIR-RCooper
**Traffic Scene Parsing through the TSP6K Dataset**
- Paper: https://arxiv.org/pdf/2303.02835.pdf
- Code: https://github.com/PengtaoJiang/TSP6K
# 其他(Others)
**Object Recognition as Next Token Prediction**
- Paper: https://arxiv.org/abs/2312.02142
- Code: https://github.com/kaiyuyue/nxtp
**ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks**
- Paper: https://arxiv.org/abs/2306.14525
- Code: https://parameternet.github.io/
**Seamless Human Motion Composition with Blended Positional Encodings**
- Paper: https://arxiv.org/abs/2402.15509
- Code: https://github.com/BarqueroGerman/FlowMDM
**LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning**
- Homepage: https://ll3da.github.io/
- Paper: https://arxiv.org/abs/2311.18651
- Code: https://github.com/Open3DA/LL3DA
**CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update**
- Homepage: https://clova-tool.github.io/
- Paper: https://arxiv.org/abs/2312.10908
**MoMask: Generative Masked Modeling of 3D Human Motions**
- Paper: https://arxiv.org/abs/2312.00063
- Code: https://github.com/EricGuo5513/momask-codes
**Amodal Ground Truth and Completion in the Wild**
- Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
- Paper: https://arxiv.org/abs/2312.17247
- Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild
**Improved Visual Grounding through Self-Consistent Explanations**
- Paper: https://arxiv.org/abs/2312.04554
- Code: https://github.com/uvavision/SelfEQ
**ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object**
- Homepage: https://chenshuang-zhang.github.io/imagenet_d/
- Paper: https://arxiv.org/abs/2403.18775
- Code: https://github.com/chenshuang-zhang/imagenet_d
**Learning from Synthetic Human Group Activities**
- Homepage: https://cjerry1243.github.io/M3Act/
- Paper https://arxiv.org/abs/2306.16772
- Code: https://github.com/cjerry1243/M3Act
**A Cross-Subject Brain Decoding Framework**
- Homepage: https://littlepure2333.github.io/MindBridge/
- Paper: https://arxiv.org/abs/2404.07850
- Code: https://github.com/littlepure2333/MindBridge
**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**
- Paper : https://arxiv.org/abs/2403.17749
- Code: https://github.com/YuqiYang213/MLoRE
**Contrastive Mean-Shift Learning for Generalized Category Discovery**
- Homepage: https://postech-cvlab.github.io/cms/
- Paper: https://arxiv.org/abs/2404.09451
- Code: https://github.com/sua-choi/CMS
================================================
FILE: CVPR2025-Papers-with-Code.md
================================================
# CVPR 2025 论文和开源项目合集(Papers with Code)
CVPR 2025 decisions are now available on OpenReview!22.1% = 2878 / 13008
> 注1:欢迎各位大佬提交issue,分享CVPR 2025论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
>
> - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code)
> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)
> - [CVPR 2024](CVPR2024-Papers-with-Code.md)
欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2025等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!

# 【CVPR 2025 论文开源目录】
- [3DGS(Gaussian Splatting)](#3DGS)
- [Agent)](#Agent)
- [Avatars](#Avatars)
- [Backbone](#Backbone)
- [CLIP](#CLIP)EVOS
- [Mamba](#Mamba)
- [Embodied AI](#Embodied-AI)
- [GAN](#GAN)
- [GNN](#GNN)
- [多模态大语言模型(MLLM)](#MLLM)
- [大语言模型(LLM)](#LLM)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [扩散模型(Diffusion Models)](#Diffusion)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像(Medical Image)](#MI)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [3D Visual Grounding(3D视觉定位)](#3DVG)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [3D生成(3D Generation)](#3D-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [具身智能(Embodied AI)](#Embodied)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [暗光图像增强(Low-light Image Enhancement)](#Low-light)
- [场景图生成(Scene Graph Generation)](#SGG)
- [风格迁移(Style Transfer)](#ST)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
- [压缩感知(Compressive Sensing)](#CS)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)
# 3DGS(Gaussian Splatting)
# Agent
**SpiritSight Agent: Advanced GUI Agent with One Look**
- Paper: https://arxiv.org/abs/2503.03196
- Code: https://hzhiyuan.github.io/SpiritSight-Agent
# Avatars
# Backbone
**Building Vision Models upon Heat Conduction**
- Paper: https://arxiv.org/abs/2405.16555
- Code: https://github.com/MzeroMiko/vHeat
**LSNet: See Large, Focus Small**
- Paper: https://arxiv.org/abs/2503.23135
- Code: https://github.com/jameslahm/lsnet
# CLIP
# Mamba
**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**
- Paper: https://arxiv.org/abs/2407.08083
- Code: https://github.com/NVlabs/MambaVision
**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**
- Paper: https://arxiv.org/abs/2411.15941
- Code: https://github.com/lewandofskee/MobileMamba
**MambaIC: State Space Models for High-Performance Learned Image Compression**
- Paper: https://arxiv.org/abs/2503.12461
- Code: https://arxiv.org/abs/2503.12461
# Embodied AI
**CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos**
- Project: https://ai4ce.github.io/CityWalker/
- Paper: https://arxiv.org/abs/2411.17820
- Code: https://github.com/ai4ce/CityWalker
# GAN
# OCR
# NeRF
# DETR
**Mr. DETR: Instructive Multi-Route Training for Detection Transformers**
- Paper: https://arxiv.org/abs/2412.10028
- Code: https://github.com/Visual-AI/Mr.DETR
# Prompt
# 多模态大语言模型(MLLM)
**LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences**
- Paper: https://arxiv.org/abs/2412.01292
- Code: https://github.com/Hoyyyaard/LSceneLLM
**DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution**
- Paper: https://arxiv.org/abs/2405.16071
- Code: https://github.com/callsys/DynRefer
**Retrieval-Augmented Personalization for Multimodal Large Language Models**
- Project Page: https://hoar012.github.io/RAP-Project/
- Paper: https://arxiv.org/abs/2410.13360
- Code: https://github.com/Hoar012/RAP-MLLM
**BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models**
- Paper: https://arxiv.org/abs/2411.15232
- Code: https://github.com/HealthX-Lab/BiomedCoOp
**FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression**
- Paper: https://arxiv.org/abs/2412.04317
- Code: https://github.com/codefanw/FlashSloth
**MMRL: Multi-Modal Representation Learning for Vision-Language Models**
- Paper: https://arxiv.org/abs/2503.08497
- Code: https://github.com/yunncheng/MMRL
**PAVE: Patching and Adapting Video Large Language Models**
- Paper: https://arxiv.org/abs/2503.19794
- Code: https://github.com/dragonlzm/PAVE
**AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization**
- Paper: https://arxiv.org/abs/2503.23733
- Code: https://github.com/THUNLP-MT/AdaMMS
# 大语言模型(LLM)
# NAS
# ReID(重识别)
**From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization**
- Paper: https://arxiv.org/abs/2503.00938
- Code: https://github.com/yuanc3/Pose2ID
**AirRoom: Objects Matter in Room Reidentification**
- Project: https://sairlab.org/airroom/
- Paper: https://arxiv.org/abs/2503.01130
**IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification**
- Paper: https://arxiv.org/abs/2503.10324
- Code: https://github.com/924973292/IDEA
# 扩散模型(Diffusion Models)
**TinyFusion: Diffusion Transformers Learned Shallow**
- Paper: https://arxiv.org/abs/2412.01199
- Code: https://github.com/VainF/TinyFusion
**DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture**
- Paper: https://arxiv.org/abs/2409.03550
- Code: https://github.com/qianlong0502/DKDM
**Tiled Diffusion**
- Homepage: https://madaror.github.io/tiled-diffusion.github.io/
- Paper: https://arxiv.org/abs/2412.15185
- Code: https://github.com/madaror/tiled-diffusion
# Vision Transformer
# 视觉和语言(Vision-Language)
**NLPrompt: Noise-Label Prompt Learning for Vision-Language Models**
- Paper: https://arxiv.org/abs/2412.01256
- Code: https://github.com/qunovo/NLPrompt
**PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability**
- Paper: https://arxiv.org/abs/2503.08481
- Code: https://github.com/unira-zwj/PhysVLM
**MMRL: Multi-Modal Representation Learning for Vision-Language Models**
- Paper: https://arxiv.org/abs/2503.08497
- Code: https://github.com/yunncheng/MMRL
# 目标检测(Object Detection)
**LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models**
- Paper: https://arxiv.org/abs/2501.18954
- Code:https://github.com/iSEE-Laboratory/LLMDet
**Mr. DETR: Instructive Multi-Route Training for Detection Transformers**
- Paper: https://arxiv.org/abs/2412.10028
- Code: https://github.com/Visual-AI/Mr.DETR
# 异常检测(Anomaly Detection)
# 目标跟踪(Object Tracking)
**Multiple Object Tracking as ID Prediction**
- Paper:https://arxiv.org/abs/2403.16848
- Code: https://github.com/MCG-NJU/MOTIP
**Omnidirectional Multi-Object Tracking**
- Paper:https://arxiv.org/abs/2503.04565
- Code:https://github.com/xifen523/OmniTrack
# 医学图像(Medical Image)
**BrainMVP: Multi-modal Vision Pre-training for Medical Image Analysis**
- Paper: https://arxiv.org/abs/2410.10604
- Code: https://github.com/shaohao011/BrainMVP
# 医学图像分割(Medical Image Segmentation)
**Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation**
- Paper: https://arxiv.org/abs/2503.13012
- Code: https://github.com/Yore0/TTDG-MGM
# 自动驾驶(Autonomous Driving)
**LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes**
- Project: https://ldkong.com/LiMoE
- Paper: https://arxiv.org/abs/2501.04004
- Code: https://github.com/Xiangxu-0103/LiMoE
# 3D点云(3D-Point-Cloud)
**Unlocking Generalization Power in LiDAR Point Cloud Registration**
- Paper: https://arxiv.org/abs/2503.10149
- Code: https://github.com/peakpang/UGP
# 3D目标检测(3D Object Detection)
# 3D语义分割(3D Semantic Segmentation)
# Low-level Vision
# 超分辨率(Super-Resolution)
**AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution**
- Paper: https://arxiv.org/abs/2412.00124
- Code: https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution
# 去噪(Denoising)
## 图像去噪(Image Denoising)
# 3D人体姿态估计(3D Human Pose Estimation)
**Reconstructing Humans with a Biomechanically Accurate Skeleton**
- Homepage: https://isshikihugh.github.io/HSMR/
- Code: https://github.com/IsshikiHugh/HSMR
#3D Visual Grounding(3D视觉定位)
**ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding**
- Homepage: https://pqh22.github.io/projects/ProxyTransformation/index.html
- Code: https://github.com/pqh22/ProxyTransformation
- Paper: https://arxiv.org/abs/2502.19247
# 图像生成(Image Generation)
**Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models**
- Paper: https://arxiv.org/abs/2501.01423
- Code: https://github.com/hustvl/LightningDiT
**SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models**
- Paper: https://arxiv.org/abs/2412.04852
- Code: https://github.com/taco-group/SleeperMark
**TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation**
- Homepage: https://byteflow-ai.github.io/TokenFlow/
- Code: https://github.com/ByteFlow-AI/TokenFlow
- Paper:https://arxiv.org/abs/2412.03069
**PAR: Parallelized Autoregressive Visual Generation**
- Project: https://epiphqny.github.io/PAR-project/
- Paper: https://arxiv.org/abs/2412.15119
- Code: https://github.com/Epiphqny/PAR
**Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis**
- Project: https://generative-photography.github.io/project/
- Paper: https://arxiv.org/abs/2412.02168
- Code: https://github.com/pandayuanyu/generative-photography
**OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation**
- Project Page: https://opening-benchmark.github.io/
- Paper: https://arxiv.org/abs/2411.18499).
- Code: https://github.com/LanceZPF/OpenING
# 视频生成(Video Generation)
**Identity-Preserving Text-to-Video Generation by Frequency Decomposition**
- Paper: https://arxiv.org/abs/2411.17440
- Code: https://github.com/PKU-YuanGroup/ConsisID
**Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models**
- Paper: https://arxiv.org/abs/2407.15642
- Code: https://github.com/maxin-cn/Cinemo
**X-Dyna: Expressive Dynamic Human Image Animation**
- Paper: https://arxiv.org/abs/2501.10021
- Code: https://github.com/bytedance/X-Dyna
**PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation**
- Paper: https://arxiv.org/pdf/2412.00596
- Code: https://github.com/pittisl/PhyT2V
**Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model**
- Project: https://liewfeng.github.io/TeaCache/
- Paper: https://arxiv.org/abs/2411.19108
- Code: https://github.com/ali-vilab/TeaCache
**AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion**
- Project: https://iva-mzsun.github.io/AR-Diffusion
- Paper: https://arxiv.org/abs/2503.07418
- Code: https://github.com/iva-mzsun/AR-Diffusion
# 图像编辑(Image Editing)
**Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing**
- Paper: https://arxiv.org/abs/2411.16832
- Code: https://github.com/taco-group/FaceLock
**h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform**
- Paper: https://arxiv.org/abs/2503.02187
- Code: https://github.com/nktoan/h-edit
# 视频编辑(Video Editing)
# 3D生成(3D Generation)
**Generative Gaussian Splatting for Unbounded 3D City Generation**
- Project: https://haozhexie.com/project/gaussian-city
- Paper: https://arxiv.org/abs/2406.06526
- Code: https://github.com/hzxie/GaussianCity
**StdGEN: Semantic-Decomposed 3D Character Generation from Single Images**
- Project: https://stdgen.github.io/
- Paper: https://arxiv.org/abs/2411.05738
- Code: https://github.com/hyz317/StdGEN
# 3D重建(3D Reconstruction)
**Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass**
- Project: https://fast3r-3d.github.io/
- Paper: https://arxiv.org/abs/2501.13928
# 人体运动生成(Human Motion Generation)
**SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance**
- Project: https://4dvlab.github.io/project_page/semgeomo/
- Paper: https://arxiv.org/abs/2503.01291
- https://github.com/4DVLab/SemGeoMo
# 视频理解(Video Understanding)
**Temporal Grounding Videos like Flipping Manga**
- Paper: https://arxiv.org/abs/2411.10332
- Code: https://github.com/yongliang-wu/NumPro
# 具身智能(Embodied AI)
**Universal Actions for Enhanced Embodied Foundation Models**
- Project: https://2toinf.github.io/UniAct/
- Paper: https://arxiv.org/abs/2501.10105
- Code: https://github.com/2toinf/UniAct
**PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability**
- Paper: https://arxiv.org/abs/2503.08481
- Code: https://github.com/unira-zwj/PhysVLM
# 知识蒸馏(Knowledge Distillation)
# 深度估计(Depth Estimation)
**DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos**
- Project: https://depthcrafter.github.io
- Paper: https://arxiv.org/abs/2409.02095
- Code: https://github.com/Tencent/DepthCrafter
**MonSter: Marry Monodepth to Stereo Unleashes Power**
- Paper: https://arxiv.org/abs/2501.08643
- Code: https://github.com/Junda24/MonSter
**DEFOM-Stereo: Depth Foundation Model Based Stereo Matching**
- Project: https://insta360-research-team.github.io/DEFOM-Stereo/
- Paper: https://arxiv.org/abs/2501.09466
- Code: https://github.com/Insta360-Research-Team/DEFOM-Stereo
# 立体匹配(Stereo Matching)
**MonSter: Marry Monodepth to Stereo Unleashes Power**
- Paper: https://arxiv.org/abs/2501.08643
- Code: https://github.com/Junda24/MonSter
# 暗光图像增强(Low-light Image Enhancement)
**HVI: A New color space for Low-light Image Enhancement**
- Paper: https://arxiv.org/abs/2502.20272
- Code: https://github.com/Fediory/HVI-CIDNet
- Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_
**ReDDiT: Efficient Diffusion as Low Light Enhancer**
- Paper: https://arxiv.org/abs/2410.12346
- Code: https://github.com/lgz-0713/ReDDiT
# 图像压缩(Image Compression)](#IC)
**MambaIC: State Space Models for High-Performance Learned Image Compression**
- Paper: https://arxiv.org/abs/2503.12461
- Code: https://arxiv.org/abs/2503.12461
# 场景图生成(Scene Graph Generation)
# 风格迁移(Style Transfer)
**StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements**
- Project: https://stylestudio-official.github.io/
- Paper: https://arxiv.org/abs/2412.08503
- Code: https://github.com/Westlake-AGI-Lab/StyleStudio
# 图像质量评价(Image Quality Assessment)
**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**
- Homepage: https://yichengchen24.github.io/projects/autocherrypicker
- Paper: https://arxiv.org/pdf/2406.20085
- Code: https://github.com/yichengchen24/ACP
# 视频质量评价(Video Quality Assessment)
# 压缩感知(Compressive Sensing)
**Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing**
- Paper: https://arxiv.org/abs/2503.08429
- Code: https://github.com/FengodChen/DMP-DUN-CVPR2025
# 数据集(Datasets)
**Objaverse++: Curated 3D Object Dataset with Quality Annotations**
- Paper: https://arxiv.org/abs/2504.07334
- Code: https://github.com/TCXX/ObjaversePlusPlus
# 其他(Others)
**DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry**
- Paper: https://arxiv.org/abs/2503.13110
- Code: https://github.com/jinli99/DTGBrepGen
**Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation**
- Paper: https://arxiv.org/abs/2503.19307
- Code: https://github.com/delaprada/HandSynthesis.git
**EVOS: Efficient Implicit Neural Training via EVOlutionary Selector**
- Homepage: https://weixiang-zhang.github.io/proj-evos/
- Paper: https://arxiv.org/abs/2412.10153
- Code: https://github.com/zwx-open/EVOS-INR
================================================
FILE: README.md
================================================
# CVPR 2026 论文和开源项目合集(Papers with Code)
CVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092
> 注1:欢迎各位大佬提交issue,分享CVPR 2026论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
>
> - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code)
> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)
欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2026等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!

# 【CVPR 2026 论文开源目录】
- [3DGS(Gaussian Splatting)](#3DGS)
- [Agent)](#Agent)
- [Avatars](#Avatars)
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [Mamba](#Mamba)
- [Embodied AI](#Embodied-AI)
- [GAN](#GAN)
- [GNN](#GNN)
- [多模态大语言模型(MLLM)](#MLLM)
- [大语言模型(LLM)](#LLM)
- [具身智能(Embodied AI)](#Embodied)
- [空间智能(Spatial Intelligence](#SI)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [扩散模型(Diffusion Models)](#Diffusion)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [异常检测(Anomaly Detection)](#Anomaly-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像(Medical Image)](#MI)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [图像编辑(Image Editing)](#Image-Editing)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [3D Visual Grounding(3D视觉定位)](#3DVG)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [3D生成(3D Generation)](#3D-Generation)
- [视频理解(Video Understanding)](#Video-Understanding)
- [行为检测(Action Detection)](#Action-Detection)
- [遥感(Remote)](#Remote)
- [文本检测(Text Detection)](#Text-Detection)
- [知识蒸馏(Knowledge Distillation)](#KD)
- [模型剪枝(Model Pruning)](#Pruning)
- [图像压缩(Image Compression)](#IC)
- [视频压缩(Video Compression)](#VC)
- [三维重建(3D Reconstruction)](#3D-Reconstruction)
- [深度估计(Depth Estimation)](#Depth-Estimation)
- [轨迹预测(Trajectory Prediction)](#TP)
- [车道线检测(Lane Detection)](#Lane-Detection)
- [图像描述(Image Captioning)](#Image-Captioning)
- [视觉问答(Visual Question Answering)](#VQA)
- [手语识别(Sign Language Recognition)](#SLR)
- [视频预测(Video Prediction)](#Video-Prediction)
- [新视点合成(Novel View Synthesis)](#NVS)
- [Zero-Shot Learning(零样本学习)](#ZSL)
- [立体匹配(Stereo Matching)](#Stereo-Matching)
- [特征匹配(Feature Matching)](#Feature-Matching)
- [暗光图像增强(Low-light Image Enhancement)](#Low-light)
- [场景图生成(Scene Graph Generation)](#SGG)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [风格迁移(Style Transfer)](#ST)
- [隐式神经表示(Implicit Neural Representations)](#INR)
- [图像质量评价(Image Quality Assessment)](#IQA)
- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
- [压缩感知(Compressive Sensing)](#CS)
- [数据集(Datasets)](#Datasets)
- [新任务(New Tasks)](#New-Tasks)
- [其他(Others)](#Others)
# 3DGS(Gaussian Splatting)
**Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting**
- Paper: https://arxiv.org/abs/2602.20933
- Code:
- Project: https://sk-fun.fun/DropAnSH-GS
**Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking**
- Paper: https://arxiv.org/abs/2512.01329
- Project: https://haza628.github.io/tagSplat/
**FastGS: Training 3D Gaussian Splatting in 100 Seconds**
- Paper: https://arxiv.org/pdf/2511.04283
- Code: https://github.com/fastgs/FastGS
- Project: https://fastgs.github.io/
# Agent
# Avatars
# Backbone
# CLIP
# Mamba
# GAN
# OCR
# NeRF
# DETR
# Prompt
# 多模态大语言模型(MLLM)
**Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking**
- Paper: https://arxiv.org/abs/2602.20330
- Code: https://github.com/UIUC-MONET/vlm-circuit-tracing
**UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark**
- Paper: https://arxiv.org/abs/2603.05075
- Code:
- Project: https://any2any-mllm.github.io/unim/
# 大语言模型(LLM)
# 具身智能(Embodied AI)
**Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI**
- Paper: https://arxiv.org/abs/2511.20620
- Code: https://github.com/ai4ce/wanderland
- Project: https://ai4ce.github.io/wanderland/
# 空间智能(Spatial Intelligence)
**Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning**
- Paper: https://arxiv.org/abs/2510.27606
- Code: https://github.com/InternLM/Spatial-SSRL
- Model: https://huggingface.co/internlm/Spatial-SSRL-7B
# NAS
# ReID(重识别)
**MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification**
- Paper: https://arxiv.org/abs/2512.03404
- Code: https://github.com/yjzhao1019/MOS
# 扩散模型(Diffusion Models)
# Vision Transformer
# 视觉和语言(Vision-Language)
**StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues**
- Paper: https://arxiv.org/abs/2602.20089
- Code: https://github.com/intelligolabs/StructXLIP
**ApET: Approximation-Error Guided Token Compression for Efficient VLMs**
- Paper: https://arxiv.org/abs/2602.19870
- Code: https://github.com/MaQianKun0/ApET
**Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking**
- Paper: https://arxiv.org/abs/2602.20330
- Code: https://github.com/UIUC-MONET/vlm-circuit-tracing
# 目标检测(Object Detection)
# 异常检测(Anomaly Detection)
# 目标跟踪(Object Tracking)
# 医学图像(Medical Image)
# 医学图像分割(Medical Image Segmentation)
**MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation**
- Paper: https://arxiv.org/abs/2602.20423
- Code: https://github.com/HealthX-Lab/MedCLIPSeg
- Project: https://tahakoleilat.github.io/MedCLIPSeg
# 自动驾驶(Autonomous Driving)
**Open-Vocabulary Domain Generalization in Urban-Scene Segmentation**
- Paper: https://arxiv.org/pdf/2602.18853
- Code: https://github.com/DZhaoXd/s2_corr
**U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences**
- Paper: https://arxiv.org/abs/2512.02982
- Code: https://github.com/worldbench/U4D
# 3D点云(3D-Point-Cloud)
**CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation**
- Paper: https://arxiv.org/abs/2602.20409
- Code: https://github.com/SarthakM320/CLIPoint3D
# 3D目标检测(3D Object Detection)
# 3D语义分割(3D Semantic Segmentation)
# Low-level Vision
# 超分辨率(Super-Resolution)
# 去噪(Denoising)
## 图像去噪(Image Denoising)
# 3D人体姿态估计(3D Human Pose Estimation)
#3D Visual Grounding(3D视觉定位)
# 图像生成(Image Generation)
ExpPortrait: Expressive Portrait Generation via Personalized Representation
- Paper: https://arxiv.org/abs/2602.19900
- Code:
# 视频生成(Video Generation)
# 图像编辑(Image Editing)
# 视频编辑(Video Editing)
# 3D生成(3D Generation)
# 3D重建(3D Reconstruction)
**tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction**
- Project: https://cwchenwang.github.io/tttLRM/
- Paper: https://arxiv.org/abs/2602.20160
- Code: https://github.com/cwchenwang/tttLRM
**Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning**
- Project: https://flow3r-project.github.io/
- Paper: https://arxiv.org/abs/2602.20157
- Code: https://github.com/Kidrauh/flow3r
**RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing**
- Paper: https://arxiv.org/abs/2602.19753
- Code: https://github.com/yyyykf/RAP
# 人体运动生成(Human Motion Generation)
# 视频理解(Video Understanding)
# 遥感(Remote)
Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation
- Paper: https://arxiv.org/abs/2602.19863
- Code: None
# 知识蒸馏(Knowledge Distillation)
# 深度估计(Depth Estimation)
# 立体匹配(Stereo Matching)
# 暗光图像增强(Low-light Image Enhancement)
# 图像压缩(Image Compression)](#IC)
# 视频压缩(Video Compression)](#VC)
**UniComp: Rethinking Video Compression Through Informational Uniqueness**
- Paper: https://arxiv.org/abs/2512.03575
- Code: https://github.com/TimeMarker-LLM/UniComp
# 场景图生成(Scene Graph Generation)
# 图像检索(Image Retrieval)
**PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing
**
- Paper: https://arxiv.org/abs/2603.04598
- Code:
# 风格迁移(Style Transfer)
# 图像质量评价(Image Quality Assessment)
# 视频质量评价(Video Quality Assessment)
# 压缩感知(Compressive Sensing)
# 数据集(Datasets)
# 其他(Others)
**Decoupling Defense Strategies for Robust Image Watermarking**
- Paper: https://arxiv.org/abs/2602.20053
- Code: None
**Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery**
- Paper: https://arxiv.org/abs/2602.19910
- Code:
**The Invisible Gorilla Effect in Out-of-distribution Detection**
- Paper: https://arxiv.org/abs/2602.20068
- Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect
**SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images**
- Paper: https://arxiv.org/abs/2602.20412
- Code:
**RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces**
- Paper: https://arxiv.org/abs/2602.20618
- Code:
**Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models**
- Paper:
- Code:
**GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement**
- Paper: https://arxiv.org/abs/2603.05095
- Code:
**FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation**
- Paper: https://arxiv.org/abs/2603.04733
- Code: https://github.com/eVI-group-SCU/FOZO
**Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning
**
- Paper: https://arxiv.org/abs/2603.04825
- Code: https://github.com/RyanZhaoIc/CAD
================================================
FILE: master
================================================