Showing preview only (325K chars total). Download the full file or copy to clipboard to get everything.
Repository: amusi/CVPR2026-Papers-with-Code Branch: main Commit: 5709455e269a Files: 9 Total size: 316.0 KB Directory structure: gitextract_f2cckni0/ ├── CVPR2019-Papers-with-Code.md ├── CVPR2020-Papers-with-Code.md ├── CVPR2021-Papers-with-Code.md ├── CVPR2022-Papers-with-Code.md ├── CVPR2023-Papers-with-Code.md ├── CVPR2024-Papers-with-Code.md ├── CVPR2025-Papers-with-Code.md ├── README.md └── master ================================================ FILE CONTENTS ================================================ ================================================ FILE: CVPR2019-Papers-with-Code.md ================================================ # CVPR2019-Code CVPR 2019 论文开源项目合集 传送门:[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code) 附:[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv) - [目标检测](#Object-Detection) - [目标跟踪](#Object-Tracking) - [语义分割](#Semantic-Segmentation) - [实例分割](#Instance-Segmentation) - [GAN](#GAN) - [人脸检测](#Face-Detection) - [人体姿态估计](#Human-Pose-Estimation) - [6DoF 姿态估计](#6DoF-Pose-Estimation) - [头部姿态估计](#Head-Pose-Estimation) - [人群密度估计](#Crowd-Counting) **更新记录:** - 20200226:添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code) - 20191026:添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv) - 20190405:添加 8 篇论文(目标检测、语义分割等方向) - 20190408:添加 6 篇论文(目标跟踪、GAN、6DoF姿态估计等方向) <a name="Object-Detection"></a> # 目标检测 **Bounding Box Regression with Uncertainty for Accurate Object Detection** - arXiv:<https://arxiv.org/abs/1809.08545> - github:<https://github.com/yihui-he/KL-Loss> <a name="Object-Tracking"></a> # 目标跟踪 **Fast Online Object Tracking and Segmentation: A Unifying Approach** - arXiv:<https://arxiv.org/abs/1812.05050> - github:<https://github.com/foolwood/SiamMask> - homepage:<http://www.robots.ox.ac.uk/~qwang/SiamMask> **Unsupervised Deep Tracking** - arXiv:<https://arxiv.org/abs/1904.01828> - github:<https://github.com/594422814/UDT> - github(PyTorch):<https://github.com/594422814/UDT_pytorch> **Target-Aware Deep Tracking** - arXiv:<https://arxiv.org/abs/1904.01772> - homepage:<https://xinli-zn.github.io/TADT-project-page/> <a name="Semantic-Segmentation"></a> # 语义分割 **Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation** - arXiv:<https://arxiv.org/abs/1903.02120> - github:[https://github.com/LinZhuoChen/DUpsampling(非官方)](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89) **Dual Attention Network for Scene Segmentation** - arXiv:<https://arxiv.org/abs/1809.02983> - github:<https://github.com/junfu1115/DANet> **Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images** - arXiv:None - github:<https://github.com/chenwydj/ultra_high_resolution_segmentation> <a name="Instance-Segmentation"></a> # 实例分割 **Mask Scoring R-CNN** - arXiv:<https://arxiv.org/abs/1903.00241> - github:<https://github.com/zjhuang22/maskscoring_rcnn> <a name="GAN"></a> # GAN **Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis** - arXiv:<https://arxiv.org/abs/1903.05628> - github:<https://github.com/HelenMao/MSGAN> <a name="Face-Detection"></a> # 人脸检测 **DSFD: Dual Shot Face Detector** - arXiv:<https://arxiv.org/abs/1810.10220> - github:<https://github.com/TencentYoutuResearch/FaceDetection-DSFD> <a name="Human-Pose-Estimation"></a> # 人体姿态估计 **Deep High-Resolution Representation Learning for Human Pose Estimation** - arXiv:<https://arxiv.org/abs/1902.09212> - github:<https://github.com/leoxiaobin/deep-high-resolution-net.pytorch> <a name="6DoF-Pose-Estimation"></a> # 6DoF姿态估计 **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation** - arXiv:<https://arxiv.org/abs/1812.11788> - github:<https://github.com/zju3dv/pvnet> <a name="Head-Pose-Estimation"></a> # 头部姿态估计 **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation** - paper:<https://github.com/shamangary/FSA-Net/blob/master/0191.pdf> - github:<https://github.com/shamangary/FSA-Net> <a name="Crowd-Counting"></a> # 人群密度估计 **Learning from Synthetic Data for Crowd Counting in the Wild** - arXiv:<https://arxiv.org/abs/1903.03303> - github:<https://github.com/gjy3035/GCC-SFCN> - homepage:<https://gjy3035.github.io/GCC-CL/> ================================================ FILE: CVPR2020-Papers-with-Code.md ================================================ # CVPR2020-Code [CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目 **【推荐阅读】** - [CVPR 2020 virtual](http://cvpr20.com/) - ECCV 2020 论文开源项目合集来了:https://github.com/amusi/ECCV2020-Code - 关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision **【CVPR 2020 论文开源目录】** - [CNN](#CNN) - [图像分类](#Image-Classification) - [视频分类](#Video-Classification) - [目标检测](#Object-Detection) - [3D目标检测](#3D-Object-Detection) - [视频目标检测](#Video-Object-Detection) - [目标跟踪](#Object-Tracking) - [语义分割](#Semantic-Segmentation) - [实例分割](#Instance-Segmentation) - [全景分割](#Panoptic-Segmentation) - [视频目标分割](#VOS) - [超像素分割](#Superpixel) - [交互式图像分割](#IIS) - [NAS](#NAS) - [GAN](#GAN) - [Re-ID](#Re-ID) - [3D点云(分类/分割/配准/跟踪等)](#3D-PointCloud) - [人脸(识别/检测/重建等)](#Face) - [人体姿态估计(2D/3D)](#Human-Pose-Estimation) - [人体解析](#Human-Parsing) - [场景文本检测](#Scene-Text-Detection) - [场景文本识别](#Scene-Text-Recognition) - [特征(点)检测和描述](#Feature) - [超分辨率](#Super-Resolution) - [模型压缩/剪枝](#Model-Compression) - [视频理解/行为识别](#Action-Recognition) - [人群计数](#Crowd-Counting) - [深度估计](#Depth-Estimation) - [6D目标姿态估计](#6DOF) - [手势估计](#Hand-Pose) - [显著性检测](#Saliency) - [去噪](#Denoising) - [去雨](#Deraining) - [去模糊](#Deblurring) - [去雾](#Dehazing) - [特征点检测与描述](#Feature) - [视觉问答(VQA)](#VQA) - [视频问答(VideoQA)](#VideoQA) - [视觉语言导航](#VLN) - [视频压缩](#Video-Compression) - [视频插帧](#Video-Frame-Interpolation) - [风格迁移](#Style-Transfer) - [车道线检测](#Lane-Detection) - ["人-物"交互(HOI)检测](#HOI) - [轨迹预测](#TP) - [运动预测](#Motion-Predication) - [光流估计](#OF) - [图像检索](#IR) - [虚拟试衣](#Virtual-Try-On) - [HDR](#HDR) - [对抗样本](#AE) - [三维重建](#3D-Reconstructing) - [深度补全](#DC) - [语义场景补全](#SSC) - [图像/视频描述](#Captioning) - [线框解析](#WP) - [数据集](#Datasets) - [其他](#Others) - [不确定中没中](#Not-Sure) <a name="CNN"></a> # CNN **Exploring Self-attention for Image Recognition** - 论文:https://hszhao.github.io/papers/cvpr20_san.pdf - 代码:https://github.com/hszhao/SAN **Improving Convolutional Networks with Self-Calibrated Convolutions** - 主页:https://mmcheng.net/scconv/ - 论文:http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf - 代码:https://github.com/backseason/SCNet **Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets** - 论文:https://arxiv.org/abs/2003.13549 - 代码:https://github.com/zeiss-microscopy/BSConv <a name="Image-Classification"></a> # 图像分类 **Interpretable and Accurate Fine-grained Recognition via Region Grouping** - 论文:https://arxiv.org/abs/2005.10411 - 代码:https://github.com/zxhuang1698/interpretability-by-parts **Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion** - 论文:https://arxiv.org/abs/2003.04490 - 代码:https://github.com/AdamKortylewski/CompositionalNets **Spatially Attentive Output Layer for Image Classification** - 论文:https://arxiv.org/abs/2004.07570 - 代码(好像被原作者删除了):https://github.com/ildoonet/spatially-attentive-output-layer <a name="Video-Classification"></a> # 视频分类 **SmallBigNet: Integrating Core and Contextual Views for Video Classification** - 论文:https://arxiv.org/abs/2006.14582 - 代码:https://github.com/xhl-video/SmallBigNet <a name="Object-Detection"></a> # 目标检测 **Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf - 代码:https://github.com/FishYuLi/BalancedGroupSoftmax **AugFPN: Improving Multi-scale Feature Learning for Object Detection** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf - 代码:https://github.com/Gus-Guo/AugFPN **Noise-Aware Fully Webly Supervised Object Detection** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html - 代码:https://github.com/shenyunhang/NA-fWebSOD/ **Learning a Unified Sample Weighting Network for Object Detection** - 论文:https://arxiv.org/abs/2006.06568 - 代码:https://github.com/caiqi/sample-weighting-network **D2Det: Towards High Quality Object Detection and Instance Segmentation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf - 代码:https://github.com/JialeCao001/D2Det **Dynamic Refinement Network for Oriented and Densely Packed Object Detection** - 论文下载链接:https://arxiv.org/abs/2005.09973 - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020 **Scale-Equalizing Pyramid Convolution for Object Detection** 论文:https://arxiv.org/abs/2005.03101 代码:https://github.com/jshilong/SEPC **Revisiting the Sibling Head in Object Detector** - 论文:https://arxiv.org/abs/2003.07540 - 代码:https://github.com/Sense-X/TSD **Scale-equalizing Pyramid Convolution for Object Detection** - 论文:暂无 - 代码:https://github.com/jshilong/SEPC **Detection in Crowded Scenes: One Proposal, Multiple Predictions** - 论文:https://arxiv.org/abs/2003.09163 - 代码:https://github.com/megvii-model/CrowdDetection **Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection** - 论文:https://arxiv.org/abs/2004.04725 - 代码:https://github.com/NVlabs/wetectron **Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection** - 论文:https://arxiv.org/abs/1912.02424 - 代码:https://github.com/sfzhang15/ATSS **BiDet: An Efficient Binarized Object Detector** - 论文:https://arxiv.org/abs/2003.03961 - 代码:https://github.com/ZiweiWangTHU/BiDet **Harmonizing Transferability and Discriminability for Adapting Object Detectors** - 论文:https://arxiv.org/abs/2003.06297 - 代码:https://github.com/chaoqichen/HTCN **CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection** - 论文:https://arxiv.org/abs/2003.09119 - 代码:https://github.com/KiveeDong/CentripetalNet **Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection** - 论文:https://arxiv.org/abs/2003.11818 - 代码:https://github.com/ggjy/HitDet.pytorch **EfficientDet: Scalable and Efficient Object Detection** - 论文:https://arxiv.org/abs/1911.09070 - 代码:https://github.com/google/automl/tree/master/efficientdet <a name="3D-Object-Detection"></a> # 3D目标检测 **SESS: Self-Ensembling Semi-Supervised 3D Object Detection** - 论文: https://arxiv.org/abs/1912.11803 - 代码:https://github.com/Na-Z/sess **Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection** - 论文: https://arxiv.org/abs/2006.04356 - 代码:https://github.com/dleam/Associate-3Ddet **What You See is What You Get: Exploiting Visibility for 3D Object Detection** - 主页:https://www.cs.cmu.edu/~peiyunh/wysiwyg/ - 论文:https://arxiv.org/abs/1912.04986 - 代码:https://github.com/peiyunh/wysiwyg **Learning Depth-Guided Convolutions for Monocular 3D Object Detection** - 论文:https://arxiv.org/abs/1912.04799 - 代码:https://github.com/dingmyu/D4LCN **Structure Aware Single-stage 3D Object Detection from Point Cloud** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html - 代码:https://github.com/skyhehe123/SA-SSD **IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf - 代码:https://github.com/swords123/IDA-3D **Train in Germany, Test in The USA: Making 3D Object Detectors Generalize** - 论文:https://arxiv.org/abs/2005.08139 - 代码:https://github.com/cxy1997/3D_adapt_auto_driving **MLCVNet: Multi-Level Context VoteNet for 3D Object Detection** - 论文:https://arxiv.org/abs/2004.05679 - 代码:https://github.com/NUAAXQ/MLCVNet **3DSSD: Point-based 3D Single Stage Object Detector** - CVPR 2020 Oral - 论文:https://arxiv.org/abs/2002.10187 - 代码:https://github.com/tomztyang/3DSSD **Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation** - 论文:https://arxiv.org/abs/2004.03572 - 代码:https://github.com/zju3dv/disprcn **End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection** - 论文:https://arxiv.org/abs/2004.03080 - 代码:https://github.com/mileyan/pseudo-LiDAR_e2e **DSGN: Deep Stereo Geometry Network for 3D Object Detection** - 论文:https://arxiv.org/abs/2001.03398 - 代码:https://github.com/chenyilun95/DSGN **LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention** - 论文:https://arxiv.org/abs/2004.01389 - 代码:https://github.com/yinjunbo/3DVID **PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection** - 论文:https://arxiv.org/abs/1912.13192 - 代码:https://github.com/sshaoshuai/PV-RCNN **Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud** - 论文:https://arxiv.org/abs/2003.01251 - 代码:https://github.com/WeijingShi/Point-GNN <a name="Video-Object-Detection"></a> # 视频目标检测 **Memory Enhanced Global-Local Aggregation for Video Object Detection** 论文:https://arxiv.org/abs/2003.12063 代码:https://github.com/Scalsol/mega.pytorch <a name="Object-Tracking"></a> # 目标跟踪 **SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking** - 论文:https://arxiv.org/abs/1911.07241 - 代码:https://github.com/ohhhyeahhh/SiamCAR **D3S -- A Discriminative Single Shot Segmentation Tracker** - 论文:https://arxiv.org/abs/1911.08862 - 代码:https://github.com/alanlukezic/d3s **ROAM: Recurrently Optimizing Tracking Model** - 论文:https://arxiv.org/abs/1907.12006 - 代码:https://github.com/skyoung/ROAM **Siam R-CNN: Visual Tracking by Re-Detection** - 主页:https://www.vision.rwth-aachen.de/page/siamrcnn - 论文:https://arxiv.org/abs/1911.12836 - 论文2:https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf - 代码:https://github.com/VisualComputingInstitute/SiamR-CNN **Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises** - 论文:https://arxiv.org/abs/2003.09595 - 代码:https://github.com/MasterBin-IIAU/CSA **High-Performance Long-Term Tracking with Meta-Updater** - 论文:https://arxiv.org/abs/2004.00305 - 代码:https://github.com/Daikenan/LTMU **AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization** - 论文:https://arxiv.org/abs/2003.12949 - 代码:https://github.com/vision4robotics/AutoTrack **Probabilistic Regression for Visual Tracking** - 论文:https://arxiv.org/abs/2003.12565 - 代码:https://github.com/visionml/pytracking **MAST: A Memory-Augmented Self-supervised Tracker** - 论文:https://arxiv.org/abs/2002.07793 - 代码:https://github.com/zlai0/MAST **Siamese Box Adaptive Network for Visual Tracking** - 论文:https://arxiv.org/abs/2003.06761 - 代码:https://github.com/hqucv/siamban ## 多目标跟踪 **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset** - 主页:https://vap.aau.dk/3d-zef/ - 论文:https://arxiv.org/abs/2006.08466 - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/ - 数据集:https://motchallenge.net/data/3D-ZeF20 <a name="Semantic-Segmentation"></a> # 语义分割 **FDA: Fourier Domain Adaptation for Semantic Segmentation** - 论文:https://arxiv.org/abs/2004.05498 - 代码:https://github.com/YanchaoYang/FDA **Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation** - 论文:暂无 - 代码:https://github.com/JianqiangWan/Super-BPD **Single-Stage Semantic Segmentation from Image Labels** - 论文:https://arxiv.org/abs/2005.08104 - 代码:https://github.com/visinf/1-stage-wseg **Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation** - 论文:https://arxiv.org/abs/2003.00867 - 代码:https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation** - 论文:http://vladlen.info/papers/MSeg.pdf - 代码:https://github.com/mseg-dataset/mseg-api **CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement** - 论文:https://arxiv.org/abs/2005.02551 - 代码:https://github.com/hkchengrex/CascadePSP **Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision** - Oral - 论文:https://arxiv.org/abs/2004.07703 - 代码:https://github.com/feipan664/IntraDA **Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation** - 论文:https://arxiv.org/abs/2004.04581 - 代码:https://github.com/YudeWang/SEAM **Temporally Distributed Networks for Fast Video Segmentation** - 论文:https://arxiv.org/abs/2004.01800 - 代码:https://github.com/feinanshan/TDNet **Context Prior for Scene Segmentation** - 论文:https://arxiv.org/abs/2004.01547 - 代码:https://git.io/ContextPrior **Strip Pooling: Rethinking Spatial Pooling for Scene Parsing** - 论文:https://arxiv.org/abs/2003.13328 - 代码:https://github.com/Andrew-Qibin/SPNet **Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks** - 论文:https://arxiv.org/abs/2003.05128 - 代码:https://github.com/shachoi/HANet **Learning Dynamic Routing for Semantic Segmentation** - 论文:https://arxiv.org/abs/2003.10401 - 代码:https://github.com/yanwei-li/DynamicRouting <a name="Instance-Segmentation"></a> # 实例分割 **D2Det: Towards High Quality Object Detection and Instance Segmentation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf - 代码:https://github.com/JialeCao001/D2Det **PolarMask: Single Shot Instance Segmentation with Polar Representation** - 论文:https://arxiv.org/abs/1909.13226 - 代码:https://github.com/xieenze/PolarMask - 解读:https://zhuanlan.zhihu.com/p/84890413 **CenterMask : Real-Time Anchor-Free Instance Segmentation** - 论文:https://arxiv.org/abs/1911.06667 - 代码:https://github.com/youngwanLEE/CenterMask **BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation** - 论文:https://arxiv.org/abs/2001.00309 - 代码:https://github.com/aim-uofa/AdelaiDet **Deep Snake for Real-Time Instance Segmentation** - 论文:https://arxiv.org/abs/2001.01629 - 代码:https://github.com/zju3dv/snake **Mask Encoding for Single Shot Instance Segmentation** - 论文:https://arxiv.org/abs/2003.11712 - 代码:https://github.com/aim-uofa/AdelaiDet <a name="Panoptic-Segmentation"></a> # 全景分割 **Video Panoptic Segmentation** - 论文:https://arxiv.org/abs/2006.11339 - 代码:https://github.com/mcahny/vps - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0 **Pixel Consensus Voting for Panoptic Segmentation** - 论文:https://arxiv.org/abs/2004.01849 - 代码:还未公布 **BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation** 论文:https://arxiv.org/abs/2003.14031 代码:https://github.com/Mooonside/BANet <a name="VOS"></a> # 视频目标分割 **A Transductive Approach for Video Object Segmentation** - 论文:https://arxiv.org/abs/2004.07193 - 代码:https://github.com/microsoft/transductive-vos.pytorch **State-Aware Tracker for Real-Time Video Object Segmentation** - 论文:https://arxiv.org/abs/2003.00482 - 代码:https://github.com/MegviiDetection/video_analyst **Learning Fast and Robust Target Models for Video Object Segmentation** - 论文:https://arxiv.org/abs/2003.00908 - 代码:https://github.com/andr345/frtm-vos **Learning Video Object Segmentation from Unlabeled Videos** - 论文:https://arxiv.org/abs/2003.05020 - 代码:https://github.com/carrierlxk/MuG <a name="Superpixel"></a> # 超像素分割 **Superpixel Segmentation with Fully Convolutional Networks** - 论文:https://arxiv.org/abs/2003.12929 - 代码:https://github.com/fuy34/superpixel_fcn <a name="IIS"></a> # 交互式图像分割 **Interactive Object Segmentation with Inside-Outside Guidance** - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet <a name="NAS"></a> # NAS **AOWS: Adaptive and optimal network width search with latency constraints** - 论文:https://arxiv.org/abs/2005.10481 - 代码:https://github.com/bermanmaxim/AOWS **Densely Connected Search Space for More Flexible Neural Architecture Search** - 论文:https://arxiv.org/abs/1906.09607 - 代码:https://github.com/JaminFong/DenseNAS **MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning** - 论文:https://arxiv.org/abs/2003.14058 - 代码:https://github.com/bhpfelix/MTLNAS **FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions** - 论文下载链接:https://arxiv.org/abs/2004.05565 - 代码:https://github.com/facebookresearch/mobile-vision **Neural Architecture Search for Lightweight Non-Local Networks** - 论文:https://arxiv.org/abs/2004.01961 - 代码:https://github.com/LiYingwei/AutoNL **Rethinking Performance Estimation in Neural Architecture Search** - 论文:https://arxiv.org/abs/2005.09917 - 代码:https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS - 解读1:https://www.zhihu.com/question/372070853/answer/1035234510 - 解读2:https://zhuanlan.zhihu.com/p/111167409 **CARS: Continuous Evolution for Efficient Neural Architecture Search** - 论文:https://arxiv.org/abs/1909.04977 - 代码(即将开源):https://github.com/huawei-noah/CARS <a name="GAN"></a> # GAN **SEAN: Image Synthesis with Semantic Region-Adaptive Normalization** - 论文:https://arxiv.org/abs/1911.12861 - 代码:https://github.com/ZPdesu/SEAN **Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation** - 论文地址:http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html - 代码地址:https://github.com/alpc91/NICE-GAN-pytorch **Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning** - 论文:https://arxiv.org/abs/1912.01899 - 代码:https://github.com/SsGood/DBGAN **PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer** - 论文:https://arxiv.org/abs/1909.06956 - 代码:https://github.com/wtjiang98/PSGAN **Semantically Mutil-modal Image Synthesis** - 主页:http://seanseattle.github.io/SMIS - 论文:https://arxiv.org/abs/2003.12697 - 代码:https://github.com/Seanseattle/SMIS **Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping** - 论文:https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf - 代码:https://github.com/yiranran/Unpaired-Portrait-Drawing **Learning to Cartoonize Using White-box Cartoon Representations** - 论文:https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf - 主页:https://systemerrorwang.github.io/White-box-Cartoonization/ - 代码:https://github.com/SystemErrorWang/White-box-Cartoonization - 解读:https://zhuanlan.zhihu.com/p/117422157 - Demo视频:https://www.bilibili.com/video/av56708333 **GAN Compression: Efficient Architectures for Interactive Conditional GANs** - 论文:https://arxiv.org/abs/2003.08936 - 代码:https://github.com/mit-han-lab/gan-compression **Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions** - 论文:https://arxiv.org/abs/2003.01826 - 代码:https://github.com/cc-hpc-itwm/UpConv <a name="Re-ID"></a> # Re-ID **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html - 代码:https://github.com/wangguanan/HOReID **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification** - 论文:https://arxiv.org/abs/2005.07862 - 数据集:暂无 **Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking** - 论文:https://arxiv.org/abs/2004.04199 - 代码:https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking **Pose-guided Visible Part Matching for Occluded Person ReID** - 论文:https://arxiv.org/abs/2004.00230 - 代码:https://github.com/hh23333/PVPM **Weakly supervised discriminative feature learning with state information for person identification** - 论文:https://arxiv.org/abs/2002.11939 - 代码:https://github.com/KovenYu/state-information <a name="3D-PointCloud"></a> # 3D点云(分类/分割/配准等) ## 3D点云卷积 **PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling** - 论文:https://arxiv.org/abs/2003.00492 - 代码:https://github.com/yanx27/PointASNL **Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds** - 论文下载链接:https://arxiv.org/abs/2003.12971 - 代码:https://github.com/raoyongming/PointGLR **Grid-GCN for Fast and Scalable Point Cloud Learning** - 论文:https://arxiv.org/abs/1912.02984 - 代码:https://github.com/Xharlie/Grid-GCN **FPConv: Learning Local Flattening for Point Convolution** - 论文:https://arxiv.org/abs/2002.10701 - 代码:https://github.com/lyqun/FPConv ## 3D点云分类 **PointAugment: an Auto-Augmentation Framework for Point Cloud Classification** - 论文:https://arxiv.org/abs/2002.10876 - 代码(即将开源): https://github.com/liruihui/PointAugment/ ## 3D点云语义分割 **RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds** - 论文:https://arxiv.org/abs/1911.11236 - 代码:https://github.com/QingyongHu/RandLA-Net - 解读:https://zhuanlan.zhihu.com/p/105433460 **Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels** - 论文:https://arxiv.org/abs/2004.04091 - 代码:https://github.com/alex-xun-xu/WeakSupPointCloudSeg **PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation** - 论文:https://arxiv.org/abs/2003.14032 - 代码:https://github.com/edwardzhou130/PolarSeg **Learning to Segment 3D Point Clouds in 2D Image Space** - 论文:https://arxiv.org/abs/2003.05593 - 代码:https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space ## 3D点云实例分割 PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation - 论文:https://arxiv.org/abs/2004.01658 - 代码:https://github.com/Jia-Research-Lab/PointGroup ## 3D点云配准 **Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences** - 论文:https://arxiv.org/abs/2005.01014 - 代码:https://github.com/XiaoshuiHuang/fmr **D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features** - 论文:https://arxiv.org/abs/2003.03164 - 代码:https://github.com/XuyangBai/D3Feat **RPM-Net: Robust Point Matching using Learned Features** - 论文:https://arxiv.org/abs/2003.13479 - 代码:https://github.com/yewzijian/RPMNet ## 3D点云补全 **Cascaded Refinement Network for Point Cloud Completion** - 论文:https://arxiv.org/abs/2004.03327 - 代码:https://github.com/xiaogangw/cascaded-point-completion ## 3D点云目标跟踪 **P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds** - 论文:https://arxiv.org/abs/2005.13888 - 代码:https://github.com/HaozheQi/P2B ## 其他 **An Efficient PointLSTM for Point Clouds Based Gesture Recognition** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html - 代码:https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch <a name="Face"></a> # 人脸 ## 人脸识别 **CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition** - 论文:https://arxiv.org/abs/2004.00288 - 代码:https://github.com/HuangYG123/CurricularFace **Learning Meta Face Recognition in Unseen Domains** - 论文:https://arxiv.org/abs/2003.07733 - 代码:https://github.com/cleardusk/MFR - 解读:https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ ## 人脸检测 ## 人脸活体检测 **Searching Central Difference Convolutional Networks for Face Anti-Spoofing** - 论文:https://arxiv.org/abs/2003.04092 - 代码:https://github.com/ZitongYu/CDCN ## 人脸表情识别 **Suppressing Uncertainties for Large-Scale Facial Expression Recognition** - 论文:https://arxiv.org/abs/2002.10392 - 代码(即将开源):https://github.com/kaiwang960112/Self-Cure-Network ## 人脸转正 **Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images** - 论文:https://arxiv.org/abs/2003.08124 - 代码:https://github.com/Hangz-nju-cuhk/Rotate-and-Render ## 人脸3D重建 **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"** - 论文:https://arxiv.org/abs/2003.13845 - 数据集:https://github.com/lattas/AvatarMe **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction** - 论文:https://arxiv.org/abs/2003.13989 - 代码:https://github.com/zhuhao-nju/facescape <a name="Human-Pose-Estimation"></a> # 人体姿态估计(2D/3D) ## 2D人体姿态估计 **TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting** - 主页:https://yzhq97.github.io/transmomo/ - 论文:https://arxiv.org/abs/2003.14401 - 代码:https://github.com/yzhq97/transmomo.pytorch **HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation** - 论文:https://arxiv.org/abs/1908.10357 - 代码:https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation **The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation** - 论文:https://arxiv.org/abs/1911.07524 - 代码:https://github.com/HuangJunJie2017/UDP-Pose - 解读:https://zhuanlan.zhihu.com/p/92525039 **Distribution-Aware Coordinate Representation for Human Pose Estimation** - 主页:https://ilovepose.github.io/coco/ - 论文:https://arxiv.org/abs/1910.06278 - 代码:https://github.com/ilovepose/DarkPose ## 3D人体姿态估计 **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data** - 论文:https://arxiv.org/abs/2006.07778 - 代码:https://github.com/Nicholasli1995/EvoSkeleton **Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach** - 主页:https://www.zhe-zhang.com/cvpr2020 - 论文:https://arxiv.org/abs/2003.11163 - 代码:https://github.com/CHUNYUWANG/imu-human-pose-pytorch **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data** - 论文下载链接:https://arxiv.org/abs/2004.01166 - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML **Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis** - 主页:http://val.cds.iisc.ac.in/pgp-human/ - 论文:https://arxiv.org/abs/2004.04400 **Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation** - 论文:https://arxiv.org/abs/2004.00329 - 代码:https://github.com/fabbrimatteo/LoCO **VIBE: Video Inference for Human Body Pose and Shape Estimation** - 论文:https://arxiv.org/abs/1912.05656 - 代码:https://github.com/mkocabas/VIBE **Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation** - 论文:https://arxiv.org/abs/2002.11251 - 代码:https://github.com/vnmr/JointVideoPose3D **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS** - 论文:https://arxiv.org/abs/2003.03972 - 数据集:暂无 <a name="Human-Parsing"></a> # 人体解析 **Correlating Edge, Pose with Parsing** - 论文:https://arxiv.org/abs/2005.01431 - 代码:https://github.com/ziwei-zh/CorrPM <a name="Scene-Text-Detection"></a> # 场景文本检测 **STEFANN: Scene Text Editor using Font Adaptive Neural Network** - 主页:https://prasunroy.github.io/stefann/ - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html - 代码:https://github.com/prasunroy/stefann - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k **ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf - 代码:https://github.com/wangyuxin87/ContourNet **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** - 论文:https://arxiv.org/abs/2003.10608 - 代码和数据集:https://github.com/Jyouhou/UnrealText/ **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network** - 论文:https://arxiv.org/abs/2002.10200 - 代码(即将开源):https://github.com/Yuliang-Liu/bezier_curve_text_spotting - 代码(即将开源):https://github.com/aim-uofa/adet **Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection** - 论文:https://arxiv.org/abs/2003.07493 - 代码:https://github.com/GXYM/DRRG <a name="Scene-Text-Recognition"></a> # 场景文本识别 **SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition** - 论文:https://arxiv.org/abs/2005.10977 - 代码:https://github.com/Pay20Y/SEED **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** - 论文:https://arxiv.org/abs/2003.10608 - 代码和数据集:https://github.com/Jyouhou/UnrealText/ **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network** - 论文:https://arxiv.org/abs/2002.10200 - 代码(即将开源):https://github.com/aim-uofa/adet **Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition** - 论文:https://arxiv.org/abs/2003.06606 - 代码:https://github.com/Canjie-Luo/Text-Image-Augmentation <a name="Feature"></a> # 特征(点)检测和描述 **SuperGlue: Learning Feature Matching with Graph Neural Networks** - 论文:https://arxiv.org/abs/1911.11763 - 代码:https://github.com/magicleap/SuperGluePretrainedNetwork <a name="Super-Resolution"></a> # 超分辨率 ## 图像超分辨率 **Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html - 代码:https://github.com/guoyongcs/DRN **Learning Texture Transformer Network for Image Super-Resolution** - 论文:https://arxiv.org/abs/2006.04139 - 代码:https://github.com/FuzhiYang/TTSR **Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining** - 论文:https://arxiv.org/abs/2006.01424 - 代码:https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention **Structure-Preserving Super Resolution with Gradient Guidance** - 论文:https://arxiv.org/abs/2003.13081 - 代码:https://github.com/Maclory/SPSR **Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy** 论文:https://arxiv.org/abs/2004.00448 代码:https://github.com/clovaai/cutblur ## 视频超分辨率 **TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution** - 论文:https://arxiv.org/abs/1812.02898 - 代码:https://github.com/YapengTian/TDAN-VSR-CVPR-2020 **Space-Time-Aware Multi-Resolution Video Enhancement** - 主页:https://alterzero.github.io/projects/STAR.html - 论文:http://arxiv.org/abs/2003.13170 - 代码:https://github.com/alterzero/STARnet **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution** - 论文:https://arxiv.org/abs/2002.11616 - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 <a name="Model-Compression"></a> # 模型压缩/剪枝 **DMCP: Differentiable Markov Channel Pruning for Neural Networks** - 论文:https://arxiv.org/abs/2005.03354 - 代码:https://github.com/zx55/dmcp **Forward and Backward Information Retention for Accurate Binary Neural Networks** - 论文:https://arxiv.org/abs/1909.10788 - 代码:https://github.com/htqin/IR-Net **Towards Efficient Model Compression via Learned Global Ranking** - 论文:https://arxiv.org/abs/1904.12368 - 代码:https://github.com/cmu-enyac/LeGR **HRank: Filter Pruning using High-Rank Feature Map** - 论文:http://arxiv.org/abs/2002.10179 - 代码:https://github.com/lmbxmu/HRank **GAN Compression: Efficient Architectures for Interactive Conditional GANs** - 论文:https://arxiv.org/abs/2003.08936 - 代码:https://github.com/mit-han-lab/gan-compression **Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression** - 论文:https://arxiv.org/abs/2003.08935 - 代码:https://github.com/ofsoundof/group_sparsity <a name="Action-Recognition"></a> # 视频理解/行为识别 **Oops! Predicting Unintentional Action in Video** - 主页:https://oops.cs.columbia.edu/ - 论文:https://arxiv.org/abs/1911.11206 - 代码:https://github.com/cvlab-columbia/oops - 数据集:https://oops.cs.columbia.edu/data **PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition** - 论文:https://arxiv.org/abs/1911.12409 - 代码:https://github.com/shlizee/Predict-Cluster **Intra- and Inter-Action Understanding via Temporal Action Parsing** - 论文:https://arxiv.org/abs/2005.10229 - 主页和数据集:https://sdolivia.github.io/TAPOS/ **3DV: 3D Dynamic Voxel for Action Recognition in Depth Video** - 论文:https://arxiv.org/abs/2005.05501 - 代码:https://github.com/3huo/3DV-Action **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding** - 主页:https://sdolivia.github.io/FineGym/ - 论文:https://arxiv.org/abs/2004.06704 **TEA: Temporal Excitation and Aggregation for Action Recognition** - 论文:https://arxiv.org/abs/2004.01398 - 代码:https://github.com/Phoenix1327/tea-action-recognition **X3D: Expanding Architectures for Efficient Video Recognition** - 论文:https://arxiv.org/abs/2004.04730 - 代码:https://github.com/facebookresearch/SlowFast **Temporal Pyramid Network for Action Recognition** - 主页:https://decisionforce.github.io/TPN - 论文:https://arxiv.org/abs/2004.03548 - 代码:https://github.com/decisionforce/TPN ## 基于骨架的动作识别 **Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition** - 论文:https://arxiv.org/abs/2003.14111 - 代码:https://github.com/kenziyuliu/ms-g3d <a name="Crowd-Counting"></a> # 人群计数 <a name="Depth-Estimation"></a> # 深度估计 **BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf - 代码:https://github.com/Yeh-yu-hsuan/BiFuse **Focus on defocus: bridging the synthetic to real domain gap for depth estimation** - 论文:https://arxiv.org/abs/2005.09623 - 代码:https://github.com/dvl-tum/defocus-net **Bi3D: Stereo Depth Estimation via Binary Classifications** - 论文:https://arxiv.org/abs/2005.07274 - 代码:https://github.com/NVlabs/Bi3D **AANet: Adaptive Aggregation Network for Efficient Stereo Matching** - 论文:https://arxiv.org/abs/2004.09548 - 代码:https://github.com/haofeixu/aanet **Towards Better Generalization: Joint Depth-Pose Learning without PoseNet** - 论文:https://github.com/B1ueber2y/TrianFlow - 代码:https://github.com/B1ueber2y/TrianFlow ## 单目深度估计 **On the uncertainty of self-supervised monocular depth estimation** - 论文:https://arxiv.org/abs/2005.06209 - 代码:https://github.com/mattpoggi/mono-uncertainty **3D Packing for Self-Supervised Monocular Depth Estimation** - 论文:https://arxiv.org/abs/1905.02693 - 代码:https://github.com/TRI-ML/packnet-sfm - Demo视频:https://www.bilibili.com/video/av70562892/ **Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation** - 论文:https://arxiv.org/abs/2002.12114 - 代码:https://github.com/yzhao520/ARC <a name="6DOF"></a> # 6D目标姿态估计 **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf - 代码:https://github.com/ethnhe/PVN3D **MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion** - 论文:https://arxiv.org/abs/2004.04336 - 代码:https://github.com/wkentaro/morefusion **EPOS: Estimating 6D Pose of Objects with Symmetries** 主页:http://cmp.felk.cvut.cz/epos 论文:https://arxiv.org/abs/2004.00605 **G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features** - 论文:https://arxiv.org/abs/2003.11089 - 代码:https://github.com/DC1991/G2L_Net <a name="Hand-Pose"></a> # 手势估计 **HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation** - 论文:https://arxiv.org/abs/2004.00060 - 主页:http://vision.sice.indiana.edu/projects/hopenet **Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data** - 论文:https://arxiv.org/abs/2003.09572 - 代码:https://github.com/CalciferZh/minimal-hand <a name="Saliency"></a> # 显著性检测 **JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection** - 论文:https://arxiv.org/abs/2004.08515 - 代码:https://github.com/kerenfu/JLDCF/ **UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders** - 主页:http://dpfan.net/d3netbenchmark/ - 论文:https://arxiv.org/abs/2004.05763 - 代码:https://github.com/JingZhang617/UCNet <a name="Denoising"></a> # 去噪 **A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising** - 论文:https://arxiv.org/abs/2003.12751 - 代码:https://github.com/Vandermode/NoiseModel **CycleISP: Real Image Restoration via Improved Data Synthesis** - 论文:https://arxiv.org/abs/2003.07761 - 代码:https://github.com/swz30/CycleISP <a name="Deraining"></a> # 去雨 **Multi-Scale Progressive Fusion Network for Single Image Deraining** - 论文:https://arxiv.org/abs/2003.10985 - 代码:https://github.com/kuihua/MSPFN **Detail-recovery Image Deraining via Context Aggregation Networks** - 论文:https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html - 代码:https://github.com/Dengsgithub/DRD-Net <a name="Deblurring"></a> # 去模糊 ## 视频去模糊 **Cascaded Deep Video Deblurring Using Temporal Sharpness Prior** - 主页:https://csbhr.github.io/projects/cdvd-tsp/index.html - 论文:https://arxiv.org/abs/2004.02501 - 代码:https://github.com/csbhr/CDVD-TSP <a name="Dehazing"></a> # 去雾 **Domain Adaptation for Image Dehazing** - 论文:https://arxiv.org/abs/2005.04668 - 代码:https://github.com/HUSTSYJ/DA_dahazing **Multi-Scale Boosted Dehazing Network with Dense Feature Fusion** - 论文:https://arxiv.org/abs/2004.13388 - 代码:https://github.com/BookerDeWitt/MSBDN-DFF <a name="Feature"></a> # 特征点检测与描述 **ASLFeat: Learning Local Features of Accurate Shape and Localization** - 论文:https://arxiv.org/abs/2003.10071 - 代码:https://github.com/lzx551402/aslfeat <a name="VQA"></a> # 视觉问答(VQA) **VC R-CNN:Visual Commonsense R-CNN** - 论文:https://arxiv.org/abs/2002.12204 - 代码:https://github.com/Wangt-CN/VC-R-CNN <a name="VideoQA"></a> # 视频问答(VideoQA) **Hierarchical Conditional Relation Networks for Video Question Answering** - 论文:https://arxiv.org/abs/2002.10698 - 代码:https://github.com/thaolmk54/hcrn-videoqa <a name="VLN"></a> # 视觉语言导航 **Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training** - 论文:https://arxiv.org/abs/2002.10638 - 代码(即将开源):https://github.com/weituo12321/PREVALENT <a name="Video-Compression"></a> # 视频压缩 **Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement** - 论文:https://arxiv.org/abs/2003.01966 - 代码:https://github.com/RenYang-home/HLVC <a name="Video-Frame-Interpolation"></a> # 视频插帧 **AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation** - 论文:https://arxiv.org/abs/1907.10244 - 代码:https://github.com/HyeongminLEE/AdaCoF-pytorch **FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html - 代码:https://github.com/CM-BF/FeatureFlow **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution** - 论文:https://arxiv.org/abs/2002.11616 - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 **Space-Time-Aware Multi-Resolution Video Enhancement** - 主页:https://alterzero.github.io/projects/STAR.html - 论文:http://arxiv.org/abs/2003.13170 - 代码:https://github.com/alterzero/STARnet **Scene-Adaptive Video Frame Interpolation via Meta-Learning** - 论文:https://arxiv.org/abs/2004.00779 - 代码:https://github.com/myungsub/meta-interpolation **Softmax Splatting for Video Frame Interpolation** - 主页:http://sniklaus.com/papers/softsplat - 论文:https://arxiv.org/abs/2003.05534 - 代码:https://github.com/sniklaus/softmax-splatting <a name="Style-Transfer"></a> # 风格迁移 **Diversified Arbitrary Style Transfer via Deep Feature Perturbation** - 论文:https://arxiv.org/abs/1909.08223 - 代码:https://github.com/EndyWon/Deep-Feature-Perturbation **Collaborative Distillation for Ultra-Resolution Universal Style Transfer** - 论文:https://arxiv.org/abs/2003.08436 - 代码:https://github.com/mingsun-tse/collaborative-distillation <a name="Lane-Detection"></a> # 车道线检测 **Inter-Region Affinity Distillation for Road Marking Segmentation** - 论文:https://arxiv.org/abs/2004.05304 - 代码:https://github.com/cardwing/Codes-for-IntRA-KD <a name="HOI"></a> # "人-物"交互(HOT)检测 **PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection** - 论文:https://arxiv.org/abs/1912.12898 - 代码:https://github.com/YueLiao/PPDM **Detailed 2D-3D Joint Representation for Human-Object Interaction** - 论文:https://arxiv.org/abs/2004.08154 - 代码:https://github.com/DirtyHarryLYL/DJ-RN **Cascaded Human-Object Interaction Recognition** - 论文:https://arxiv.org/abs/2003.04262 - 代码:https://github.com/tfzhou/C-HOI **VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions** - 论文:https://arxiv.org/abs/2003.05541 - 代码:https://github.com/ASMIftekhar/VSGNet <a name="TP"></a> # 轨迹预测 **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction** - 论文:https://arxiv.org/abs/1912.06445 - 代码:https://github.com/JunweiLiang/Multiverse - 数据集:https://next.cs.cmu.edu/multiverse/ **Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction** - 论文:https://arxiv.org/abs/2002.11927 - 代码:https://github.com/abduallahmohamed/Social-STGCNN <a name="Motion-Predication"></a> # 运动预测 **Collaborative Motion Prediction via Neural Motion Message Passing** - 论文:https://arxiv.org/abs/2003.06594 - 代码:https://github.com/PhyllisH/NMMP **MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps** - 论文:https://arxiv.org/abs/2003.06754 - 代码:https://github.com/pxiangwu/MotionNet <a name="OF"></a> # 光流估计 **Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation** - 论文:https://arxiv.org/abs/2003.13045 - 代码:https://github.com/lliuz/ARFlow <a name="IR"></a> # 图像检索 **Evade Deep Image Retrieval by Stashing Private Images in the Hash Space** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html - 代码:https://github.com/sugarruy/hashstash <a name="Virtual-Try-On"></a> # 虚拟试衣 **Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content** - 论文:https://arxiv.org/abs/2003.05863 - 代码:https://github.com/switchablenorms/DeepFashion_Try_On <a name="HDR"></a> # HDR **Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline** - 主页:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR - 论文下载链接:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf - 代码:https://github.com/alex04072000/SingleHDR <a name="AE"></a> # 对抗样本 **Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction** - 论文:https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf - 代码:https://github.com/erbloo/dr_cvpr20 **Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance** - 论文:https://arxiv.org/abs/1911.02466 - 代码:https://github.com/ZhengyuZhao/PerC-Adversarial <a name="3D-Reconstructing"></a> # 三维重建 **Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild** - **CVPR 2020 Best Paper** - 主页:https://elliottwu.com/projects/unsup3d/ - 论文:https://arxiv.org/abs/1911.11130 - 代码:https://github.com/elliottwu/unsup3d **Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization** - 主页:https://shunsukesaito.github.io/PIFuHD/ - 论文:https://arxiv.org/abs/2004.00452 - 代码:https://github.com/facebookresearch/pifuhd - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf - 代码:https://github.com/chaitanya100100/TailorNet - 数据集:https://github.com/zycliao/TailorNet_dataset **Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf - 代码:https://github.com/jchibane/if-net - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf - 代码:https://github.com/aymenmir1/pix2surf <a name="DC"></a> # 深度补全 **Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End** 论文:https://arxiv.org/abs/2006.03349 代码:https://github.com/abdo-eldesokey/pncnn <a name="SSC"></a> # 语义场景补全 **3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior** - 论文:https://arxiv.org/abs/2003.14052 - 代码:https://github.com/charlesCXK/TorchSSC <a name="Captioning"></a> # 图像/视频描述 **Syntax-Aware Action Targeting for Video Captioning** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf - 代码:https://github.com/SydCaption/SAAT <a name="WP"></a> # 线框解析 **Holistically-Attracted Wireframe Parser** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html - 代码:https://github.com/cherubicXN/hawp <a name="Datasets"></a> # 数据集 **OASIS: A Large-Scale Dataset for Single Image 3D in the Wild** - 论文:https://arxiv.org/abs/2007.13215 - 数据集:https://oasis.cs.princeton.edu/ **STEFANN: Scene Text Editor using Font Adaptive Neural Network** - 主页:https://prasunroy.github.io/stefann/ - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html - 代码:https://github.com/prasunroy/stefann - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k **Interactive Object Segmentation with Inside-Outside Guidance** - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet **Video Panoptic Segmentation** - 论文:https://arxiv.org/abs/2006.11339 - 代码:https://github.com/mcahny/vps - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0 **FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html - 代码:https://github.com/HKUSTCV/FSS-1000 - 数据集:https://github.com/HKUSTCV/FSS-1000 **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset** - 主页:https://vap.aau.dk/3d-zef/ - 论文:https://arxiv.org/abs/2006.08466 - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/ - 数据集:https://motchallenge.net/data/3D-ZeF20 **TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf - 代码:https://github.com/chaitanya100100/TailorNet - 数据集:https://github.com/zycliao/TailorNet_dataset **Oops! Predicting Unintentional Action in Video** - 主页:https://oops.cs.columbia.edu/ - 论文:https://arxiv.org/abs/1911.11206 - 代码:https://github.com/cvlab-columbia/oops - 数据集:https://oops.cs.columbia.edu/data **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction** - 论文:https://arxiv.org/abs/1912.06445 - 代码:https://github.com/JunweiLiang/Multiverse - 数据集:https://next.cs.cmu.edu/multiverse/ **Open Compound Domain Adaptation** - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing - 论文:https://arxiv.org/abs/1909.03403 - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA **Intra- and Inter-Action Understanding via Temporal Action Parsing** - 论文:https://arxiv.org/abs/2005.10229 - 主页和数据集:https://sdolivia.github.io/TAPOS/ **Dynamic Refinement Network for Oriented and Densely Packed Object Detection** - 论文下载链接:https://arxiv.org/abs/2005.09973 - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020 **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification** - 论文:https://arxiv.org/abs/2005.07862 - 数据集:暂无 **KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations** - 论文:https://arxiv.org/abs/2002.12687 - 数据集:https://github.com/qq456cvb/KeypointNet **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation** - 论文:http://vladlen.info/papers/MSeg.pdf - 代码:https://github.com/mseg-dataset/mseg-api - 数据集:https://github.com/mseg-dataset/mseg-semantic **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"** - 论文:https://arxiv.org/abs/2003.13845 - 数据集:https://github.com/lattas/AvatarMe **Learning to Autofocus** - 论文:https://arxiv.org/abs/2004.12260 - 数据集:暂无 **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction** - 论文:https://arxiv.org/abs/2003.13989 - 代码:https://github.com/zhuhao-nju/facescape **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data** - 论文下载链接:https://arxiv.org/abs/2004.01166 - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding** - 主页:https://sdolivia.github.io/FineGym/ - 论文:https://arxiv.org/abs/2004.06704 **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation** - 主页:https://anyirao.com/projects/SceneSeg.html - 论文下载链接:https://arxiv.org/abs/2004.02678 - 代码:https://github.com/AnyiRao/SceneSeg **Deep Homography Estimation for Dynamic Scenes** - 论文:https://arxiv.org/abs/2004.02132 - 数据集:https://github.com/lcmhoang/hmg-dynamics **Assessing Image Quality Issues for Real-World Problems** - 主页:https://vizwiz.org/tasks-and-datasets/image-quality-issues/ - 论文:https://arxiv.org/abs/2003.12511 **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** - 论文:https://arxiv.org/abs/2003.10608 - 代码和数据集:https://github.com/Jyouhou/UnrealText/ **PANDA: A Gigapixel-level Human-centric Video Dataset** - 论文:https://arxiv.org/abs/2003.04852 - 数据集:http://www.panda-dataset.com/ **IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning** - 论文:https://arxiv.org/abs/2003.02920 - 数据集:https://github.com/intra3d2019/IntrA **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS** - 论文:https://arxiv.org/abs/2003.03972 - 数据集:暂无 <a name="Others"></a> # 其他 **CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html - 代码:https://github.com/fkluger/consac **Learning to Learn Single Domain Generalization** - 论文:https://arxiv.org/abs/2003.13216 - 代码:https://github.com/joffery/M-ADA **Open Compound Domain Adaptation** - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing - 论文:https://arxiv.org/abs/1909.03403 - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA **Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision** - 论文:http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf - 代码:https://github.com/autonomousvision/differentiable_volumetric_rendering **QEBA: Query-Efficient Boundary-Based Blackbox Attack** - 论文:https://arxiv.org/abs/2005.14137 - 代码:https://github.com/AI-secure/QEBA **Equalization Loss for Long-Tailed Object Recognition** - 论文:https://arxiv.org/abs/2003.05176 - 代码:https://github.com/tztztztztz/eql.detectron2 **Instance-aware Image Colorization** - 主页:https://ericsujw.github.io/InstColorization/ - 论文:https://arxiv.org/abs/2005.10825 - 代码:https://github.com/ericsujw/InstColorization **Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting** - 论文:https://arxiv.org/abs/2005.09704 - 代码:https://github.com/Atlas200dk/sample-imageinpainting-HiFill **Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching** - 论文:https://arxiv.org/abs/2005.03860 - 代码:https://github.com/shiyujiao/cross_view_localization_DSM **Epipolar Transformers** - 论文:https://arxiv.org/abs/2005.04551 - 代码:https://github.com/yihui-he/epipolar-transformers **Bringing Old Photos Back to Life** - 主页:http://raywzy.com/Old_Photo/ - 论文:https://arxiv.org/abs/2004.09484 **MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask** - 论文:https://arxiv.org/abs/2003.10955 - 代码:https://github.com/microsoft/MaskFlownet **Self-Supervised Viewpoint Learning from Image Collections** - 论文:https://arxiv.org/abs/2004.01793 - 论文2:https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf - 代码:https://github.com/NVlabs/SSV **Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations** - Oral - 论文:https://arxiv.org/abs/2003.12237 - 代码:https://github.com/cuishuhao/BNM **Towards Learning Structure via Consensus for Face Segmentation and Parsing** - 论文:https://arxiv.org/abs/1911.00957 - 代码:https://github.com/isi-vista/structure_via_consensus **Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging** - Oral - 论文:https://arxiv.org/abs/2003.13654 - 代码:https://github.com/liuyang12/PnP-SCI **Lightweight Photometric Stereo for Facial Details Recovery** - 论文:https://arxiv.org/abs/2003.12307 - 代码:https://github.com/Juyong/FacePSNet **Footprints and Free Space from a Single Color Image** - 论文:https://arxiv.org/abs/2004.06376 - 代码:https://github.com/nianticlabs/footprints **Self-Supervised Monocular Scene Flow Estimation** - 论文:https://arxiv.org/abs/2004.04143 - 代码:https://github.com/visinf/self-mono-sf **Quasi-Newton Solver for Robust Non-Rigid Registration** - 论文:https://arxiv.org/abs/2004.04322 - 代码:https://github.com/Juyong/Fast_RNRR **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation** - 主页:https://anyirao.com/projects/SceneSeg.html - 论文下载链接:https://arxiv.org/abs/2004.02678 - 代码:https://github.com/AnyiRao/SceneSeg **DeepFLASH: An Efficient Network for Learning-based Medical Image Registration** - 论文:https://arxiv.org/abs/2004.02097 - 代码:https://github.com/jw4hv/deepflash **Self-Supervised Scene De-occlusion** - 主页:https://xiaohangzhan.github.io/projects/deocclusion/ - 论文:https://arxiv.org/abs/2004.02788 - 代码:https://github.com/XiaohangZhan/deocclusion **Polarized Reflection Removal with Perfect Alignment in the Wild** - 主页:https://leichenyang.weebly.com/project-polarized.html - 代码:https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment **Background Matting: The World is Your Green Screen** - 论文:https://arxiv.org/abs/2004.00626 - 代码:http://github.com/senguptaumd/Background-Matting **What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective** - 论文:https://arxiv.org/abs/2003.11241 - 代码:https://github.com/ZhangLi-CS/GCP_Optimization **Look-into-Object: Self-supervised Structure Modeling for Object Recognition** - 论文:暂无 - 代码:https://github.com/JDAI-CV/LIO **Video Object Grounding using Semantic Roles in Language Description** - 论文:https://arxiv.org/abs/2003.10606 - 代码:https://github.com/TheShadow29/vognet-pytorch **Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives** - 论文:https://arxiv.org/abs/2003.10739 - 代码:https://github.com/d-li14/DHM **SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization** - 论文:http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf - 代码:https://github.com/YueJiang-nj/CVPR2020-SDFDiff **On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location** - 论文:https://arxiv.org/abs/2003.07064 - 代码:https://github.com/oskyhn/CNNs-Without-Borders **GhostNet: More Features from Cheap Operations** - 论文:https://arxiv.org/abs/1911.11907 - 代码:https://github.com/iamhankai/ghostnet **AdderNet: Do We Really Need Multiplications in Deep Learning?** - 论文:https://arxiv.org/abs/1912.13200 - 代码:https://github.com/huawei-noah/AdderNet **Deep Image Harmonization via Domain Verification** - 论文:https://arxiv.org/abs/1911.13239 - 代码:https://github.com/bcmi/Image_Harmonization_Datasets **Blurry Video Frame Interpolation** - 论文:https://arxiv.org/abs/2002.12259 - 代码:https://github.com/laomao0/BIN **Extremely Dense Point Correspondences using a Learned Feature Descriptor** - 论文:https://arxiv.org/abs/2003.00619 - 代码:https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch **Filter Grafting for Deep Neural Networks** - 论文:https://arxiv.org/abs/2001.05868 - 代码:https://github.com/fxmeng/filter-grafting - 论文解读:https://www.zhihu.com/question/372070853/answer/1041569335 **Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation** - 论文:https://arxiv.org/abs/2003.02824 - 代码:https://github.com/cmhungsteve/SSTDA **Detecting Attended Visual Targets in Video** - 论文:https://arxiv.org/abs/2003.02501 - 代码:https://github.com/ejcgt/attention-target-detection **Deep Image Spatial Transformation for Person Image Generation** - 论文:https://arxiv.org/abs/2003.00696 - 代码:https://github.com/RenYurui/Global-Flow-Local-Attention **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications** - 论文:https://arxiv.org/abs/2003.01455 - 代码:https://github.com/bbrattoli/ZeroShotVideoClassification https://github.com/charlesCXK/3D-SketchAware-SSC https://github.com/Anonymous20192020/Anonymous_CVPR5767 https://github.com/avirambh/ScopeFlow https://github.com/csbhr/CDVD-TSP https://github.com/ymcidence/TBH https://github.com/yaoyao-liu/mnemonics https://github.com/meder411/Tangent-Images https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch https://github.com/sjmoran/deep_local_parametric_filters https://github.com/charlesCXK/3D-SketchAware-SSC https://github.com/bermanmaxim/AOWS https://github.com/dc3ea9f/look-into-object <a name="Not-Sure"></a> # 不确定中没中 **FADNet: A Fast and Accurate Network for Disparity Estimation** - 论文:还没出来 - 代码:https://github.com/HKBU-HPML/FADNet https://github.com/rFID-submit/RandomFID:不确定中没中 https://github.com/JackSyu/AE-MSR:不确定中没中 https://github.com/fastconvnets/cvpr2020:不确定中没中 https://github.com/aimagelab/meshed-memory-transformer:不确定中没中 https://github.com/TWSFar/CRGNet:不确定中没中 https://github.com/CVPR-2020/CDARTS:不确定中没中 https://github.com/anucvml/ddn-cvprw2020:不确定中没中 https://github.com/dl-model-recommend/model-trust:不确定中没中 https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior:不确定中没中 https://github.com/onetcvpr/O-Net:不确定中没中 https://github.com/502463708/Microcalcification_Detection:不确定中没中 https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine:不确定中没中 https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset:不确定中没中 https://github.com/cvpr-nonrigid/dataset:不确定中没中 https://github.com/theFool32/PPBA:不确定中没中 https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition ================================================ FILE: CVPR2021-Papers-with-Code.md ================================================ # CVPR 2021 论文和开源项目合集(Papers with Code) [CVPR 2021](http://cvpr2021.thecvf.com/) 论文和开源项目合集(papers with code)! CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt > 注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~  ## 【CVPR 2021 论文开源目录】 - [Best Paper](#Best-Paper) - [Backbone](#Backbone) - [NAS](#NAS) - [GAN](#GAN) - [VAE](#VAE) - [Visual Transformer](#Visual-Transformer) - [Regularization](#Regularization) - [SLAM](#SLAM) - [长尾分布(Long-Tailed)](#Long-Tailed) - [数据增广(Data Augmentation)](#DA) - [无监督/自监督(Self-Supervised)](#Un/Self-Supervised) - [半监督(Semi-Supervised)](#Semi-Supervised) - [胶囊网络(Capsule Network)](#Capsule-Network) - [图像分类(Image Classification](#Image-Classification) - [2D目标检测(Object Detection)](#Object-Detection) - [单/多目标跟踪(Object Tracking)](#Object-Tracking) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation) - [视频目标分割(Video-Object-Segmentation)](#VOS) - [交互式视频目标分割(Interactive-Video-Object-Segmentation)](#IVOS) - [显著性检测(Saliency Detection)](#Saliency-Detection) - [伪装物体检测(Camouflaged Object Detection)](#Camouflaged-Object-Detection) - [协同显著性检测(Co-Salient Object Detection)](#CoSOD) - [图像抠图(Image Matting)](#Matting) - [行人重识别(Person Re-identification)](#Re-ID) - [行人搜索(Person Search)](#Person-Search) - [视频理解/行为识别(Video Understanding)](#Video-Understanding) - [人脸识别(Face Recognition)](#Face-Recognition) - [人脸检测(Face Detection)](#Face-Detection) - [人脸活体检测(Face Anti-Spoofing)](#Face-Anti-Spoofing) - [Deepfake检测(Deepfake Detection)](#Deepfake-Detection) - [人脸年龄估计(Age-Estimation)](#Age-Estimation) - [人脸表情识别(Facial-Expression-Recognition)](#FER) - [Deepfakes](#Deepfakes) - [人体解析(Human Parsing)](#Human-Parsing) - [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation) - [动物姿态估计(Animal Pose Estimation)](#Animal-Pose-Estimation) - [手部姿态估计(Hand Pose Estimation)](#Hand-Pose-Estimation) - [Human Volumetric Capture](#Human-Volumetric-Capture) - [场景文本识别(Scene Text Recognition)](#Scene-Text-Recognition) - [图像压缩(Image Compression)](#Image-Compression) - [模型压缩/剪枝/量化](#Model-Compression) - [知识蒸馏(Knowledge Distillation)](#KD) - [超分辨率(Super-Resolution)](#Super-Resolution) - [去雾(Dehazing)](#Dehazing) - [图像恢复(Image Restoration)](#Image-Restoration) - [图像补全(Image Inpainting)](#Image-Inpainting) - [图像编辑(Image Editing)](#Image-Editing) - [图像描述(Image Captioning)](#Image-Captioning) - [字体生成(Font Generation)](#Font-Generation) - [图像匹配(Image Matching)](#Image-Matching) - [图像融合(Image Blending)](#Image-Blending) - [反光去除(Reflection Removal)](#Reflection-Removal) - [3D点云分类(3D Point Clouds Classification)](#3D-C) - [3D目标检测(3D Object Detection)](#3D-Object-Detection) - [3D语义分割(3D Semantic Segmentation)](#3D-Semantic-Segmentation) - [3D全景分割(3D Panoptic Segmentation)](#3D-Panoptic-Segmentation) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D点云配准(3D Point Cloud Registration)](#3D-PointCloud-Registration) - [3D点云补全(3D-Point-Cloud-Completion)](#3D-Point-Cloud-Completion) - [3D重建(3D Reconstruction)](#3D-Reconstruction) - [6D位姿估计(6D Pose Estimation)](#6D-Pose-Estimation) - [相机姿态估计(Camera Pose Estimation)](#Camera-Pose-Estimation) - [深度估计(Depth Estimation)](#Depth-Estimation) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [光流估计(Flow Estimation)](#Flow-Estimation) - [车道线检测(Lane Detection)](#Lane-Detection) - [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction) - [人群计数(Crowd Counting)](#Crowd-Counting) - [对抗样本(Adversarial-Examples)](#AE) - [图像检索(Image Retrieval)](#Image-Retrieval) - [视频检索(Video Retrieval)](#Video-Retrieval) - [跨模态检索(Cross-modal Retrieval)](#Cross-modal-Retrieval) - [Zero-Shot Learning](#Zero-Shot-Learning) - [联邦学习(Federated Learning)](#Federated-Learning) - [视频插帧(Video Frame Interpolation)](#Video-Frame-Interpolation) - [视觉推理(Visual Reasoning)](#Visual-Reasoning) - [图像合成(Image Synthesis)](#Image-Synthesis) - [视图合成(Visual Synthesis)](#Visual-Synthesis) - [风格迁移(Style Transfer)](#Style-Transfer) - [布局生成(Layout Generation)](#Layout-Generation) - [Domain Generalization](#Domain-Generalization) - [Domain Adaptation](#Domain-Adaptation) - [Open-Set](#Open-Set) - [Adversarial Attack](#Adversarial-Attack) - ["人-物"交互(HOI)检测](#HOI) - [阴影去除(Shadow Removal)](#Shadow-Removal) - [虚拟试衣(Virtual Try-On)](#Virtual-Try-On) - [标签噪声(Label Noise)](#Label-Noise) - [视频稳像(Video Stabilization)](#Video-Stabilization) - [数据集(Datasets)](#Datasets) - [其他(Others)](#Others) - [待添加(TODO)](#TO-DO) - [不确定中没中(Not Sure)](#Not-Sure) <a name="Best-Paper"></a> # Best Paper **GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields** - Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html - Paper(Oral): https://arxiv.org/abs/2011.12100 - Code: https://github.com/autonomousvision/giraffe - Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1 <a name="Backbone"></a> # Backbone **HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers** - Paper(Oral): https://arxiv.org/abs/2106.06560 - Code: https://github.com/dingmyu/HR-NAS **BCNet: Searching for Network Width with Bilaterally Coupled Network** - Paper: https://arxiv.org/abs/2105.10533 - Code: None **Decoupled Dynamic Filter Networks** - Homepage: https://thefoxofsky.github.io/project_pages/ddf - Paper: https://arxiv.org/abs/2104.14107 - Code: https://github.com/thefoxofsky/DDF **Lite-HRNet: A Lightweight High-Resolution Network** - Paper: https://arxiv.org/abs/2104.06403 - https://github.com/HRNet/Lite-HRNet **CondenseNet V2: Sparse Feature Reactivation for Deep Networks** - Paper: https://arxiv.org/abs/2104.04382 - Code: https://github.com/jianghaojun/CondenseNetV2 **Diverse Branch Block: Building a Convolution as an Inception-like Unit** - Paper: https://arxiv.org/abs/2103.13425 - Code: https://github.com/DingXiaoH/DiverseBranchBlock **Scaling Local Self-Attention For Parameter Efficient Visual Backbones** - Paper(Oral): https://arxiv.org/abs/2103.12731 - Code: None **ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network** - Paper: https://arxiv.org/abs/2007.00992 - Code: https://github.com/clovaai/rexnet **Involution: Inverting the Inherence of Convolution for Visual Recognition** - Paper: https://github.com/d-li14/involution - Code: https://arxiv.org/abs/2103.06255 **Coordinate Attention for Efficient Mobile Network Design** - Paper: https://arxiv.org/abs/2103.02907 - Code: https://github.com/Andrew-Qibin/CoordAttention **Inception Convolution with Efficient Dilation Search** - Paper: https://arxiv.org/abs/2012.13587 - Code: https://github.com/yifan123/IC-Conv **RepVGG: Making VGG-style ConvNets Great Again** - Paper: https://arxiv.org/abs/2101.03697 - Code: https://github.com/DingXiaoH/RepVGG <a name="NAS"></a> # NAS **HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers** - Paper(Oral): https://arxiv.org/abs/2106.06560 - Code: https://github.com/dingmyu/HR-NAS **BCNet: Searching for Network Width with Bilaterally Coupled Network** - Paper: https://arxiv.org/abs/2105.10533 - Code: None **ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search** - Paper: ttps://arxiv.org/abs/2105.10154 - Code: None **Combined Depth Space based Architecture Search For Person Re-identification** - Paper: https://arxiv.org/abs/2104.04163 - Code: None **DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation** - Paper(Oral): https://arxiv.org/abs/2103.15954 - Code: None **HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers** - Paper(Oral): None - Code: https://github.com/dingmyu/HR-NAS **Neural Architecture Search with Random Labels** - Paper: https://arxiv.org/abs/2101.11834 - Code: None **Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search** - Paper: https://arxiv.org/abs/2101.11342 - Code: None **Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation** - Paper: https://arxiv.org/abs/2105.12971 - Code: None **Prioritized Architecture Sampling with Monto-Carlo Tree Search** - Paper: https://arxiv.org/abs/2103.11922 - Code: https://github.com/xiusu/NAS-Bench-Macro **Contrastive Neural Architecture Search with Neural Architecture Comparators** - Paper: https://arxiv.org/abs/2103.05471 - Code: https://github.com/chenyaofo/CTNAS **AttentiveNAS: Improving Neural Architecture Search via Attentive** - Paper: https://arxiv.org/abs/2011.09011 - Code: None **ReNAS: Relativistic Evaluation of Neural Architecture Search** - Paper: https://arxiv.org/abs/1910.01523 - Code: None **HourNAS: Extremely Fast Neural Architecture** - Paper: https://arxiv.org/abs/2005.14446 - Code: None **Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator** - Paper: https://arxiv.org/abs/2103.07289 - Code: https://github.com/eric8607242/SGNAS **OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection** - Paper: https://arxiv.org/abs/2103.04507 - Code: https://github.com/VDIGPKU/OPANAS **Inception Convolution with Efficient Dilation Search** - Paper: https://arxiv.org/abs/2012.13587 - Code: None <a name="GAN"></a> # GAN **High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network** - Paper: https://arxiv.org/abs/2105.09188 - Code: https://github.com/csjliang/LPTN - Dataset: https://github.com/csjliang/LPTN **DG-Font: Deformable Generative Networks for Unsupervised Font Generation** - Paper: https://arxiv.org/abs/2104.03064 - Code: https://github.com/ecnuycxie/DG-Font **PD-GAN: Probabilistic Diverse GAN for Image Inpainting** - Paper: https://arxiv.org/abs/2105.02201 - Code: https://github.com/KumapowerLIU/PD-GAN **StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: https://arxiv.org/abs/2104.14754 - Code: https://github.com/naver-ai/StyleMapGAN - Demo Video: https://youtu.be/qCapNyRA_Ng **Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer** - Paper: https://arxiv.org/abs/2104.05376 - Code: https://github.com/PaddlePaddle/PaddleGAN/ **Regularizing Generative Adversarial Networks under Limited Data** - Homepage: https://hytseng0509.github.io/lecam-gan/ - Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf - Code: https://github.com/google/lecam-gan **Towards Real-World Blind Face Restoration with Generative Facial Prior** - Paper: https://arxiv.org/abs/2101.04061 - Code: None **TediGAN: Text-Guided Diverse Image Generation and Manipulation** - Homepage: https://xiaweihao.com/projects/tedigan/ - Paper: https://arxiv.org/abs/2012.03308 - Code: https://github.com/weihaox/TediGAN **Generative Hierarchical Features from Synthesizing Image** - Homepage: https://genforce.github.io/ghfeat/ - Paper(Oral): https://arxiv.org/abs/2007.10379 - Code: https://github.com/genforce/ghfeat **Teachers Do More Than Teach: Compressing Image-to-Image Models** - Paper: https://arxiv.org/abs/2103.03467 - Code: https://github.com/snap-research/CAT **HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms** - Paper: https://arxiv.org/abs/2011.11731 - Code: https://github.com/mahmoudnafifi/HistoGAN **pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis** - Homepage: https://marcoamonteiro.github.io/pi-GAN-website/ - Paper(Oral): https://arxiv.org/abs/2012.00926 - Code: None **DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network** - Paper: https://arxiv.org/abs/2103.07893 - Code: None **Diverse Semantic Image Synthesis via Probability Distribution Modeling** - Paper: https://arxiv.org/abs/2103.06878 - Code: https://github.com/tzt101/INADE.git **LOHO: Latent Optimization of Hairstyles via Orthogonalization** - Paper: https://arxiv.org/abs/2103.03891 - Code: None **PISE: Person Image Synthesis and Editing with Decoupled GAN** - Paper: https://arxiv.org/abs/2103.04023 - Code: https://github.com/Zhangjinso/PISE **DeFLOCNet: Deep Image Editing via Flexible Low-level Controls** - Paper: http://raywzy.com/ - Code: http://raywzy.com/ **PD-GAN: Probabilistic Diverse GAN for Image Inpainting** - Paper: http://raywzy.com/ - Code: http://raywzy.com/ **Efficient Conditional GAN Transfer with Knowledge Propagation across Classes** - Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes - Code: http://github.com/mshahbazi72/cGANTransfer **Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: None - Code: None **Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs** - Paper: https://arxiv.org/abs/2011.14107 - Code: None **Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation** - Homepage: https://eladrich.github.io/pixel2style2pixel/ - Paper: https://arxiv.org/abs/2008.00951 - Code: https://github.com/eladrich/pixel2style2pixel **A 3D GAN for Improved Large-pose Facial Recognition** - Paper: https://arxiv.org/abs/2012.10545 - Code: None **HumanGAN: A Generative Model of Humans Images** - Paper: https://arxiv.org/abs/2103.06902 - Code: None **ID-Unet: Iterative Soft and Hard Deformation for View Synthesis** - Paper: https://arxiv.org/abs/2103.02264 - Code: https://github.com/MingyuY/Iterative-view-synthesis **CoMoGAN: continuous model-guided image-to-image translation** - Paper(Oral): https://arxiv.org/abs/2103.06879 - Code: https://github.com/cv-rits/CoMoGAN **Training Generative Adversarial Networks in One Stage** - Paper: https://arxiv.org/abs/2103.00430 - Code: None **Closed-Form Factorization of Latent Semantics in GANs** - Homepage: https://genforce.github.io/sefa/ - Paper(Oral): https://arxiv.org/abs/2007.06600 - Code: https://github.com/genforce/sefa **Anycost GANs for Interactive Image Synthesis and Editing** - Paper: https://arxiv.org/abs/2103.03243 - Code: https://github.com/mit-han-lab/anycost-gan **Image-to-image Translation via Hierarchical Style Disentanglement** - Paper: https://arxiv.org/abs/2103.01456 - Code: https://github.com/imlixinyang/HiSD <a name="VAE"></a> # VAE **Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders** - Homepage: https://taldatech.github.io/soft-intro-vae-web/ - Paper: https://arxiv.org/abs/2012.13253 - Code: https://github.com/taldatech/soft-intro-vae-pytorch <a name="Visual Transformer"></a> # Visual Transformer **1. End-to-End Human Pose and Mesh Reconstruction with Transformers** - Paper: https://arxiv.org/abs/2012.09760 - Code: https://github.com/microsoft/MeshTransformer **2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition** - Paper: https://arxiv.org/abs/2101.06184 - Code: https://github.com/tobyperrett/trx **3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain** - Paper: https://arxiv.org/abs/2103.16110 - Code: https://github.com/mczhuge/Kaleido-BERT **4. HOTR: End-to-End Human-Object Interaction Detection with Transformers** - Paper: https://arxiv.org/abs/2104.13682 - Code: https://github.com/kakaobrain/HOTR **5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving** - Paper: https://arxiv.org/abs/2104.09224 - Code: https://github.com/autonomousvision/transfuser **6. Pose Recognition with Cascade Transformers** - Paper: https://arxiv.org/abs/2104.06976 - Code: https://github.com/mlpc-ucsd/PRTR **7. Variational Transformer Networks for Layout Generation** - Paper: https://arxiv.org/abs/2104.02416 - Code: None **8. LoFTR: Detector-Free Local Feature Matching with Transformers** - Homepage: https://zju3dv.github.io/loftr/ - Paper: https://arxiv.org/abs/2104.00680 - Code: https://github.com/zju3dv/LoFTR **9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers** - Paper: https://arxiv.org/abs/2012.15840 - Code: https://github.com/fudan-zvg/SETR **10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers** - Paper: https://arxiv.org/abs/2103.16553 - Code: None **11. Transformer Tracking** - Paper: https://arxiv.org/abs/2103.15436 - Code: https://github.com/chenxin-dlut/TransT **12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers** - Paper(Oral): https://arxiv.org/abs/2106.06560 - Code: https://github.com/dingmyu/HR-NAS **13. MIST: Multiple Instance Spatial Transformer** - Paper: https://arxiv.org/abs/1811.10725 - Code: None **14. Multimodal Motion Prediction with Stacked Transformers** - Paper: https://arxiv.org/abs/2103.11624 - Code: https://decisionforce.github.io/mmTransformer **15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning** - Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning - Code: https://github.com/amzn/image-to-recipe-transformers **16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking** - Paper(Oral): https://arxiv.org/abs/2103.11681 - Code: https://github.com/594422814/TransformerTrack **17. Pre-Trained Image Processing Transformer** - Paper: https://arxiv.org/abs/2012.00364 - Code: None **18. End-to-End Video Instance Segmentation with Transformers** - Paper(Oral): https://arxiv.org/abs/2011.14503 - Code: https://github.com/Epiphqny/VisTR **19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers** - Paper(Oral): https://arxiv.org/abs/2011.09094 - Code: https://github.com/dddzg/up-detr **20. End-to-End Human Object Interaction Detection with HOI Transformer** - Paper: https://arxiv.org/abs/2103.04503 - Code: https://github.com/bbepoch/HoiTransformer **21. Transformer Interpretability Beyond Attention Visualization** - Paper: https://arxiv.org/abs/2012.09838 - Code: https://github.com/hila-chefer/Transformer-Explainability **22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer** - Paper: None - Code: None **23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity** - Paper: None - Code: None **24. Line Segment Detection Using Transformers without Edges** - Paper(Oral): https://arxiv.org/abs/2101.01909 - Code: None **25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers** - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html - Code: None **26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation** - Paper(Oral): https://arxiv.org/abs/2101.08833 - Code: https://github.com/dukebw/SSTVOS **27. Facial Action Unit Detection With Transformers** - Paper: None - Code: None **28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition** - Paper: None - Code: None **29. Lesion-Aware Transformers for Diabetic Retinopathy Grading** - Paper: None - Code: None **30. Topological Planning With Transformers for Vision-and-Language Navigation** - Paper: https://arxiv.org/abs/2012.05292 - Code: None **31. Adaptive Image Transformer for One-Shot Object Detection** - Paper: None - Code: None **32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos** - Paper: None - Code: None **33. Taming Transformers for High-Resolution Image Synthesis** - Homepage: https://compvis.github.io/taming-transformers/ - Paper(Oral): https://arxiv.org/abs/2012.09841 - Code: https://github.com/CompVis/taming-transformers **34. Self-Supervised Video Hashing via Bidirectional Transformers** - Paper: None - Code: None **35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos** - Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf - Code: None **36. Gaussian Context Transformer** - Paper: None - Code: None **37. General Multi-Label Image Classification With Transformers** - Paper: https://arxiv.org/abs/2011.14027 - Code: None **38. Bottleneck Transformers for Visual Recognition** - Paper: https://arxiv.org/abs/2101.11605 - Code: None **39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation** - Paper(Oral): https://arxiv.org/abs/2011.13922 - Code: https://github.com/YicongHong/Recurrent-VLN-BERT **40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling** - Paper(Oral): https://arxiv.org/abs/2102.06183 - Code: https://github.com/jayleicn/ClipBERT **41. Self-attention based Text Knowledge Mining for Text Detection** - Paper: None - Code: https://github.com/CVI-SZU/STKM **42. SSAN: Separable Self-Attention Network for Video Representation Learning** - Paper: None - Code: None **43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones** - Paper(Oral): https://arxiv.org/abs/2103.12731 - Code: None <a name="Regularization"></a> # Regularization **Regularizing Neural Networks via Adversarial Model Perturbation** - Paper: https://arxiv.org/abs/2010.04925 - Code: https://github.com/hiyouga/AMP-Regularizer <a name="SLAM"></a> # SLAM **Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation** - Paper: https://arxiv.org/abs/2105.07593 - Code: None **Generalizing to the Open World: Deep Visual Odometry with Online Adaptation** - Paper: https://arxiv.org/abs/2103.15279 - Code: https://arxiv.org/abs/2103.15279 <a name="Long-Tailed"></a> # 长尾分布(Long-Tailed) **Adversarial Robustness under Long-Tailed Distribution** - Paper(Oral): https://arxiv.org/abs/2104.02703 - Code: https://github.com/wutong16/Adversarial_Long-Tail **Distribution Alignment: A Unified Framework for Long-tail Visual Recognition** - Paper: https://arxiv.org/abs/2103.16370 - Code: https://github.com/Megvii-BaseDetection/DisAlign **Adaptive Class Suppression Loss for Long-Tail Object Detection** - Paper: https://arxiv.org/abs/2104.00885 - Code: https://github.com/CASIA-IVA-Lab/ACSL **Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification** - Paper: https://arxiv.org/abs/2103.14267 - Code: None <a name="DA"></a> # 数据增广(Data Augmentation) **Scale-aware Automatic Augmentation for Object Detection** - Paper: https://arxiv.org/abs/2103.17220 - Code: https://github.com/Jia-Research-Lab/SA-AutoAug <a name="Un/Self-Supervised"></a> # 无监督/自监督(Un/Self-Supervised) **Domain-Specific Suppression for Adaptive Object Detection** - Paper: https://arxiv.org/abs/2105.03570 - Code: None **A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning** - Paper: https://arxiv.org/abs/2104.14558 - Code: https://github.com/facebookresearch/SlowFast **Unsupervised Multi-Source Domain Adaptation for Person Re-Identification** - Paper: https://arxiv.org/abs/2104.12961 - Code: None **Self-supervised Video Representation Learning by Context and Motion Decoupling** - Paper: https://arxiv.org/abs/2104.00862 - Code: None **Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning** - Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html - Paper: https://arxiv.org/abs/2009.05769 - Code: https://github.com/FingerRec/BE **Spatially Consistent Representation Learning** - Paper: https://arxiv.org/abs/2103.06122 - Code: None **VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples** - Paper: https://arxiv.org/abs/2103.05905 - Code: https://github.com/tinapan-pt/VideoMoCo **Exploring Simple Siamese Representation Learning** - Paper(Oral): https://arxiv.org/abs/2011.10566 - Code: None **Dense Contrastive Learning for Self-Supervised Visual Pre-Training** - Paper(Oral): https://arxiv.org/abs/2011.09157 - Code: https://github.com/WXinlong/DenseCL <a name="Semi-Supervised"></a> # 半监督学习(Semi-Supervised ) **Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework** - 作者单位: 阿里巴巴 - Paper: https://arxiv.org/abs/2103.11402 - Code: None **Adaptive Consistency Regularization for Semi-Supervised Transfer Learning** - Paper: https://arxiv.org/abs/2103.02193 - Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning <a name="Capsule-Network"></a> # 胶囊网络(Capsule Network) **Capsule Network is Not More Robust than Convolutional Network** - Paper: https://arxiv.org/abs/2103.15459 - Code: None <a name="Image-Classification"></a> # 图像分类(Image Classification) **Correlated Input-Dependent Label Noise in Large-Scale Image Classification** - Paper(Oral): https://arxiv.org/abs/2105.10305 - Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet <a name="Object-Detection"></a> # 2D目标检测(Object Detection) ## 2D目标检测 **1. Scaled-YOLOv4: Scaling Cross Stage Partial Network** - 作者单位: 中央研究院, 英特尔, 静宜大学 - Paper: https://arxiv.org/abs/2011.08036 - Code: https://github.com/WongKinYiu/ScaledYOLOv4 - 中文解读: [YOLOv4官方改进版来了!55.8% AP!速度最高达1774 FPS,Scaled-YOLOv4正式开源!](https://mp.weixin.qq.com/s/AcrJPNoAVhn8cGBUGK7ekA) **2. You Only Look One-level Feature** - 作者单位: 中科院, 国科大, 旷视科技 - Paper: https://arxiv.org/abs/2103.09460 - Code: https://github.com/megvii-model/YOLOF - 中文解读: [CVPR 2021 | 没有FPN!中科院&旷视提出YOLOF:你只需看一层特征](https://mp.weixin.qq.com/s/EJqAG1gTVaP2icI6QL742A) **3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals** - 作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校 - Paper: https://arxiv.org/abs/2011.12450 - Code: https://github.com/PeizeSun/SparseR-CNN - 中文解读: [目标检测新范式!港大同济伯克利提出Sparse R-CNN,代码刚刚开源!](https://mp.weixin.qq.com/s/P2Zgh1wTqf8L2976El5nfQ) **4. End-to-End Object Detection with Fully Convolutional Network** - 作者单位: 旷视科技, 西安交通大学 - Paper: https://arxiv.org/abs/2012.03544 - Code: https://github.com/Megvii-BaseDetection/DeFCN **5. Dynamic Head: Unifying Object Detection Heads with Attentions** - 作者单位: 微软 - Paper: https://arxiv.org/abs/2106.08322 - Code: https://github.com/microsoft/DynamicHead - 中文解读: [60.6 AP!打破COCO记录!微软提出DyHead:将注意力与目标检测Heads统一](https://mp.weixin.qq.com/s/uYPUqVXwNau71VAYW3bYIA) **6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection** - 作者单位: 南京理工大学, Momenta, 南京大学, 清华大学 - Paper: https://arxiv.org/abs/2011.12885 - Code: https://github.com/implus/GFocalV2 - 中文解读:[CVPR 2021 | GFLV2:目标检测良心技术,无Cost涨点!](https://mp.weixin.qq.com/s/JB7k3NwXU-cDueg6w9mghQ) **7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers** - 作者单位: 华南理工大学, 腾讯微信AI - Paper(Oral): https://arxiv.org/abs/2011.09094 - Code: https://github.com/dddzg/up-detr - 中文解读: [CVPR 2021 Oral | Transformer再发力!华南理工和微信提出UP-DETR:无监督预训练检测器](https://mp.weixin.qq.com/s/Hprp7B16SGFhVEKXfKiRBQ) **8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators** - 作者单位: 威斯康星大学, 谷歌 - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf - Code: https://github.com/tensorflow/models/tree/master/research/object_detection **9. Tracking Pedestrian Heads in Dense Crowd** - 作者单位: 雷恩第一大学 - Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html - Code1: https://github.com/Sentient07/HeadHunter - Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T - Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ **10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation** - 作者单位: 香港科技大学, 华为诺亚 - Paper: https://arxiv.org/abs/2105.12971 - Code: None **11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery** - 作者单位: A*star, 四川大学, 南洋理工大学 - Paper: https://arxiv.org/abs/2105.12990 - Code: None **12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection** - 作者单位: 旷视科技 - Paper: https://arxiv.org/abs/2104.06936 - Code: None **13. Multi-Scale Aligned Distillation for Low-Resolution Detection** - 作者单位: 香港中文大学, Adobe研究院, 思谋科技 - Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf - Code: https://github.com/Jia-Research-Lab/MSAD **14. Adaptive Class Suppression Loss for Long-Tail Object Detection** - 作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise - Paper: https://arxiv.org/abs/2104.00885 - Code: https://github.com/CASIA-IVA-Lab/ACSL **15. VarifocalNet: An IoU-aware Dense Object Detector** - 作者单位: 昆士兰科技大学, 昆士兰大学 - Paper(Oral): https://arxiv.org/abs/2008.13367 - Code: https://github.com/hyz-xmaster/VarifocalNet **16. OTA: Optimal Transport Assignment for Object Detection** - 作者单位: 早稻田大学, 旷视科技 - Paper: https://arxiv.org/abs/2103.14259 - Code: https://github.com/Megvii-BaseDetection/OTA **17. Distilling Object Detectors via Decoupled Features** - 作者单位: 华为诺亚, 悉尼大学 - Paper: https://arxiv.org/abs/2103.14475 - Code: https://github.com/ggjy/DeFeat.pytorch **18. Robust and Accurate Object Detection via Adversarial Learning** - 作者单位: 谷歌, UCLA, UCSC - Paper: https://arxiv.org/abs/2103.13886 - Code: None **19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection** - 作者单位: 北京大学, Anyvision, 石溪大学 - Paper: https://arxiv.org/abs/2103.04507 - Code: https://github.com/VDIGPKU/OPANAS **20. Multiple Instance Active Learning for Object Detection** - 作者单位: 国科大, 华为诺亚, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf - Code: https://github.com/yuantn/MI-AOD **21. Towards Open World Object Detection** - 作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学 - Paper(Oral): https://arxiv.org/abs/2103.02603 - Code: https://github.com/JosephKJ/OWOD **22. RankDetNet: Delving Into Ranking Constraints for Object Detection** - 作者单位: 赛灵思 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_RankDetNet_Delving_Into_Ranking_Constraints_for_Object_Detection_CVPR_2021_paper.html - Code: None ## 旋转目标检测 **23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection** - 作者单位: 上海交通大学, 国科大 - Paper: https://arxiv.org/abs/2011.09670 - Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow - Code2: https://github.com/yangxue0827/RotationDetection **24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection** - 作者单位: 武汉大学 - Paper: https://arxiv.org/abs/2103.07733 - Code: https://github.com/csuhan/ReDet **25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection** - 作者单位: 国科大, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Beyond_Bounding-Box_Convex-Hull_Feature_Adaptation_for_Oriented_and_Densely_Packed_CVPR_2021_paper.html - Code: https://github.com/SDL-GuoZonghao/BeyondBoundingBox ## Few-Shot目标检测 **26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss** - 作者单位: 复旦大学, 同济大学, 浙江大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html - Code: None **27. Adaptive Image Transformer for One-Shot Object Detection** - 作者单位: 中央研究院, 台湾AI Labs - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Adaptive_Image_Transformer_for_One-Shot_Object_Detection_CVPR_2021_paper.html - Code: None **28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection** - 作者单位: 北京大学, 北邮 - Paper: https://arxiv.org/abs/2103.17115 - Code: https://github.com/hzhupku/DCNet **29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection** - 作者单位: 卡内基梅隆大学(CMU) - Paper: https://arxiv.org/abs/2103.01903 - Code: None **30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding** - 作者单位: 南加利福尼亚大学, 旷视科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sun_FSCE_Few-Shot_Object_Detection_via_Contrastive_Proposal_Encoding_CVPR_2021_paper.html - Code: https://github.com/MegviiDetection/FSCE **31. Hallucination Improves Few-Shot Object Detection** - 作者单位: 伊利诺伊大学厄巴纳-香槟分校 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Hallucination_Improves_Few-Shot_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/pppplin/HallucFsDet **32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment** - 作者单位: 新加坡国立大学, SIMTech - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Few-Shot_Object_Detection_via_Classification_Refinement_and_Distractor_Retreatment_CVPR_2021_paper.html - Code: None **33. Generalized Few-Shot Object Detection Without Forgetting** - 作者单位: 旷视科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Fan_Generalized_Few-Shot_Object_Detection_Without_Forgetting_CVPR_2021_paper.html - Code: None **34. Transformation Invariant Few-Shot Object Detection** - 作者单位: 华为诺亚方舟实验室 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html - Code: None **35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation** - 作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Khandelwal_UniT_Unified_Knowledge_Transfer_for_Any-Shot_Object_Detection_and_Segmentation_CVPR_2021_paper.html - Code: https://github.com/ubc-vision/UniT **36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection** - 作者单位: 国科大, 厦门大学, 鹏城实验室 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Beyond_Max-Margin_Class_Margin_Equilibrium_for_Few-Shot_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/Bohao-Lee/CME ## 半监督目标检测 **37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]** - 作者单位: 旷视科技, 复旦大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Points_As_Queries_Weakly_Semi-Supervised_Object_Detection_by_Points_CVPR_2021_paper.html - Code: None **38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection** - 作者单位: 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Data-Uncertainty_Guided_Multi-Phase_Learning_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html - Code: None **39. Positive-Unlabeled Data Purification in the Wild for Object Detection** - 作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html - Code: None **40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection** - 作者单位: 阿里巴巴, 香港理工大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Interactive_Self-Training_With_Mean_Teachers_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html - Code: None **41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework** - 作者单位: 阿里巴巴 - Paper: https://arxiv.org/abs/2103.11402 - Code: None **42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection** - 作者单位: 卡内基梅隆大学(CMU), 亚马逊 - Homepage: https://yihet.com/humble-teacher - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tang_Humble_Teachers_Teach_Better_Students_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/lryta/HumbleTeacher **43. Interpolation-Based Semi-Supervised Learning for Object Detection** - 作者单位: 首尔大学, 阿尔托大学等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Jeong_Interpolation-Based_Semi-Supervised_Learning_for_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/soo89/ISD-SSD # 域自适应目标检测 **44. Domain-Specific Suppression for Adaptive Object Detection** - 作者单位: 中科院, 寒武纪, 国科大 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Domain-Specific_Suppression_for_Adaptive_Object_Detection_CVPR_2021_paper.html - Code: None **45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection** - 作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰 - Paper: https://arxiv.org/abs/2103.04224 - Code: None **46. Unbiased Mean Teacher for Cross-Domain Object Detection** - 作者单位: 电子科技大学, ETH Zurich - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Deng_Unbiased_Mean_Teacher_for_Cross-Domain_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/kinredon/umt **47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors** - 作者单位: 香港大学, 厦门大学, Deepwise AI Lab - Paper: https://arxiv.org/abs/2103.13757 - Code: None ## 自监督目标检测 **48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge** - 作者单位: 弗莱堡大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Valverde_There_Is_More_Than_Meets_the_Eye_Self-Supervised_Multi-Object_Detection_CVPR_2021_paper.html - Code: http://rl.uni-freiburg.de/research/multimodal-distill **49. Instance Localization for Self-supervised Detection Pretraining** - 作者单位: 香港中文大学, 微软亚洲研究院 - Paper: https://arxiv.org/abs/2102.08318 - Code: https://github.com/limbo0000/InstanceLoc ## 弱监督目标检测 **50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection** - 作者单位: 北航, 鹏城实验室, 商汤科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Informative_and_Consistent_Correspondence_Mining_for_Cross-Domain_Weakly_Supervised_Object_CVPR_2021_paper.html - Code: None **51. DAP: Detection-Aware Pre-training with Weak Supervision** - 作者单位: UIUC, 微软 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhong_DAP_Detection-Aware_Pre-Training_With_Weak_Supervision_CVPR_2021_paper.html - Code: None ## 其他 **52. Open-Vocabulary Object Detection Using Captions** - 作者单位:Snap, 哥伦比亚大学 - Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html - Code: https://github.com/alirezazareian/ovr-cnn **53. Depth From Camera Motion and Object Detection** - 作者单位: 密歇根大学, SIAI - Paper: https://arxiv.org/abs/2103.01468 - Code: https://github.com/griffbr/ODMD - Dataset: https://github.com/griffbr/ODMD **54. Unsupervised Object Detection With LIDAR Clues** - 作者单位: 商汤科技, 国科大, 中科大 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tian_Unsupervised_Object_Detection_With_LIDAR_Clues_CVPR_2021_paper.html - Code: None **55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs** - 作者单位: 国科大, 北理, 中科院, 商汤科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Bu_GAIA_A_Transfer_Learning_System_of_Object_Detection_That_Fits_CVPR_2021_paper.html - Code: https://github.com/GAIA-vision/GAIA-det **56. General Instance Distillation for Object Detection** - 作者单位: 旷视科技, 北航 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dai_General_Instance_Distillation_for_Object_Detection_CVPR_2021_paper.html - Code: None **57. AQD: Towards Accurate Quantized Object Detection** - 作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_AQD_Towards_Accurate_Quantized_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/aim-uofa/model-quantization **58. Scale-Aware Automatic Augmentation for Object Detection** - 作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scale-Aware_Automatic_Augmentation_for_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/Jia-Research-Lab/SA-AutoAug **59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection** - 作者单位: 同济大学, 商汤科技, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tan_Equalization_Loss_v2_A_New_Gradient_Balance_Approach_for_Long-Tailed_CVPR_2021_paper.html - Code: https://github.com/tztztztztz/eqlv2 **60. Class-Aware Robust Adversarial Training for Object Detection** - 作者单位: 哥伦比亚大学, 中央研究院 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Class-Aware_Robust_Adversarial_Training_for_Object_Detection_CVPR_2021_paper.html - Code: None **61. Improved Handling of Motion Blur in Online Object Detection** - 作者单位: 伦敦大学学院 - Homepage: http://visual.cs.ucl.ac.uk/pubs/handlingMotionBlur/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sayed_Improved_Handling_of_Motion_Blur_in_Online_Object_Detection_CVPR_2021_paper.html - Code: None **62. Multiple Instance Active Learning for Object Detection** - 作者单位: 国科大, 华为诺亚 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/yuantn/MI-AOD **63. Neural Auto-Exposure for High-Dynamic Range Object Detection** - 作者单位: Algolux, 普林斯顿大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html - Code: None **64. Generalizable Pedestrian Detection: The Elephant in the Room** - 作者单位: IIAI, 阿尔托大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hasan_Generalizable_Pedestrian_Detection_The_Elephant_in_the_Room_CVPR_2021_paper.html - Code: https://github.com/hasanirtiza/Pedestron **65. Neural Auto-Exposure for High-Dynamic Range Object Detection** - 作者单位: Algolux, 普林斯顿大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html - Code: None <a name="Object-Tracking"></a> # 单/多目标跟踪(Object Tracking) ## 单目标跟踪 **LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search** - Paper: https://arxiv.org/abs/2104.14545 - Code: https://github.com/researchmm/LightTrack **Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark** - Homepage: https://sites.google.com/view/langtrackbenchmark/ - Paper: https://arxiv.org/abs/2103.16746 - Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit - Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang **IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking** - Paper: https://arxiv.org/abs/2103.14938 - Code: https://github.com/VISION-SJTU/IoUattack **Graph Attention Tracking** - Paper: https://arxiv.org/abs/2011.11204 - Code: https://github.com/ohhhyeahhh/SiamGAT **Rotation Equivariant Siamese Networks for Tracking** - Paper: https://arxiv.org/abs/2012.13078 - Code: None **Track to Detect and Segment: An Online Multi-Object Tracker** - Homepage: https://jialianwu.com/projects/TraDeS.html - Paper: None - Code: None **Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking** - Paper(Oral): https://arxiv.org/abs/2103.11681 - Code: https://github.com/594422814/TransformerTrack **Transformer Tracking** - Paper: https://arxiv.org/abs/2103.15436 - Code: https://github.com/chenxin-dlut/TransT ## 多目标跟踪 **Tracking Pedestrian Heads in Dense Crowd** - Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html - Code1: https://github.com/Sentient07/HeadHunter - Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T - Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ **Multiple Object Tracking with Correlation Learning** - Paper: https://arxiv.org/abs/2104.03541 - Code: None **Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking** - Paper: https://arxiv.org/abs/2012.02337 - Code: None **Learning a Proposal Classifier for Multiple Object Tracking** - Paper: https://arxiv.org/abs/2103.07889 - Code: https://github.com/daip13/LPC_MOT.git **Track to Detect and Segment: An Online Multi-Object Tracker** - Homepage: https://jialianwu.com/projects/TraDeS.html - Paper: https://arxiv.org/abs/2103.08808 - Code: https://github.com/JialianW/TraDeS <a name="Semantic-Segmentation"></a> # 语义分割(Semantic Segmentation) **1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation** - 作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学 - Homepage: https://nirkin.com/hyperseg/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf - Code: https://github.com/YuvalNirkin/hyperseg **2. Rethinking BiSeNet For Real-time Semantic Segmentation** - 作者单位: 美团 - Paper: https://arxiv.org/abs/2104.13188 - Code: https://github.com/MichaelFan01/STDC-Seg **3. Progressive Semantic Segmentation** - 作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学 - Paper: https://arxiv.org/abs/2104.03778 - Code: https://github.com/VinAIResearch/MagNet **4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers** - 作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI - Homepage: https://fudan-zvg.github.io/SETR - Paper: https://arxiv.org/abs/2012.15840 - Code: https://github.com/fudan-zvg/SETR **5. Capturing Omni-Range Context for Omnidirectional Segmentation** - 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为 - Paper: https://arxiv.org/abs/2103.05687 - Code: None **6. Learning Statistical Texture for Semantic Segmentation** - 作者单位: 北航, 商汤科技 - Paper: https://arxiv.org/abs/2103.04133 - Code: None **7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation** - 作者单位: 高通AI研究院 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Borse_InverseForm_A_Loss_Function_for_Structured_Boundary-Aware_Segmentation_CVPR_2021_paper.html - Code: None **8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation** - 作者单位: Joyy Inc, 快手, 北航等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_DCNAS_Densely_Connected_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.html - Code: None ## 弱监督语义分割 **9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation** - 作者单位: 延世大学, 成均馆大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lee_Railroad_Is_Not_a_Train_Saliency_As_Pseudo-Pixel_Supervision_for_CVPR_2021_paper.html - Code: https://github.com/halbielee/EPS **10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation** - 作者单位: 延世大学 - Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/ - Paper: https://arxiv.org/abs/2104.00905 - Code: None **11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation** - 作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学 - Paper: https://arxiv.org/abs/2103.14581 - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom **12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation** - 作者单位: 北京理工大学, 美团 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_Embedded_Discriminative_Attention_Mechanism_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://github.com/allenwu97/EDAM **13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation** - 作者单位: 首尔大学 - Paper: https://arxiv.org/abs/2103.08907 - Code: https://github.com/jbeomlee93/BBAM ## 半监督语义分割 **14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision** - 作者单位: 北京大学, 微软亚洲研究院 - Paper: https://arxiv.org/abs/2106.01226 - Code: https://github.com/charlesCXK/TorchSemiSeg **15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation** - 作者单位: 华为, 大连理工大学, 北京大学 - Paper: https://arxiv.org/abs/2103.04705 - Code: None **16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency** - 作者单位: 香港中文大学, 思谋科技, 牛津大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lai_Semi-Supervised_Semantic_Segmentation_With_Directional_Context-Aware_Consistency_CVPR_2021_paper.html - Code: None **17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization** - 作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Semantic_Segmentation_With_Generative_Models_Semi-Supervised_Learning_and_Strong_Out-of-Domain_CVPR_2021_paper.html - Code: https://nv-tlabs.github.io/semanticGAN/ **18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation** - 作者单位: ETH Zurich, 伯恩大学, 鲁汶大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hoyer_Three_Ways_To_Improve_Semantic_Segmentation_With_Self-Supervised_Depth_Estimation_CVPR_2021_paper.html - Code: https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth ## 域自适应语义分割 **19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation** - 作者单位: ETH Zurich, 鲁汶大学, 电子科技大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html - Code: None **20. Source-Free Domain Adaptation for Semantic Segmentation** - 作者单位: 华东师范大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Source-Free_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2021_paper.html - Code: None **21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation** - 作者单位: Idiap Research Institute, EPFL, 日内瓦大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/S_Uncertainty_Reduction_for_Model_Adaptation_in_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://git.io/JthPp **22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation** - 作者单位: 达姆施塔特工业大学, hessian.AI - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Araslanov_Self-Supervised_Augmentation_Consistency_for_Adapting_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://github.com/visinf/da-sac **23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening** - 作者单位: LG AI研究院, KAIST等 - Paper: https://arxiv.org/abs/2103.15597 - Code: https://github.com/shachoi/RobustNet **24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization** - 作者单位: 香港大学, 深睿医疗 - Paper: https://arxiv.org/abs/2103.13041 - Code: None **25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation** - 作者单位: 香港城市大学, 百度 - Paper: https://arxiv.org/abs/2103.05254 - Code: https://github.com/cyang-cityu/MetaCorrection **26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation** - 作者单位: 华为云, 华为诺亚, 大连理工大学 - Paper: https://arxiv.org/abs/2103.04717 - Code: None **27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation** - 作者单位: 中国科学技术大学, 微软亚洲研究院 - Paper: https://arxiv.org/abs/2101.10979 - Code: https://github.com/microsoft/ProDA **28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation** - 作者单位: 南卡罗来纳大学, 天远视科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_DANNet_A_One-Stage_Domain_Adaptation_Network_for_Unsupervised_Nighttime_Semantic_CVPR_2021_paper.html - Code: https://github.com/W-zx-Y/DANNet ## Few-Shot语义分割 **29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation** - 作者单位: MBZUAI, IIAI, 哈工大 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Xie_Scale-Aware_Graph_Neural_Network_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html - Code: None **30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation** - 作者单位: 国科大, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Anti-Aliasing_Semantic_Reconstruction_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://github.com/Bibkiller/ASR ## 无监督语义分割 **31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering** - 作者单位: UT-Austin, 康奈尔大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cho_PiCIE_Unsupervised_Semantic_Segmentation_Using_Invariance_and_Equivariance_in_Clustering_CVPR_2021_paper.html - Code: https:// github.com/janghyuncho/PiCIE ## 视频语义分割 **32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild** - 作者单位: 浙江大学, 百度, 悉尼科技大学 - Homepage: https://www.vspwdataset.com/ - Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf - GitHub: https://github.com/sssdddwww2/vspw_dataset_download ## 其它 **33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations** - 作者单位: 帕多瓦大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html - Code: https://lttm.dei.unipd.it/paper_data/SDR/ **34. Exploit Visual Dependency Relations for Semantic Segmentation** - 作者单位: 伊利诺伊大学芝加哥分校 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Exploit_Visual_Dependency_Relations_for_Semantic_Segmentation_CVPR_2021_paper.html - Code: None **35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs** - 作者单位: Institute for Infocomm Research, 新加坡国立大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.html - Code: None **36. PLOP: Learning without Forgetting for Continual Semantic Segmentation** - 作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai - Paper: https://arxiv.org/abs/2011.11390 - Code: https://github.com/arthurdouillard/CVPR2021_PLOP **37. 3D-to-2D Distillation for Indoor Scene Parsing** - 作者单位: 香港中文大学, 香港大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_3D-to-2D_Distillation_for_Indoor_Scene_Parsing_CVPR_2021_paper.html - Code: None **38. Bidirectional Projection Network for Cross Dimension Scene Understanding** - 作者单位: 香港中文大学, 牛津大学等 - Paper(Oral): https://arxiv.org/abs/2103.14326 - Code: https://github.com/wbhu/BPNet **39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation** - 作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html - Code: https://github.com/lxtGH/PFSegNets <a name="Instance-Segmentation"></a> # 实例分割(Instance Segmentation) **DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation** - Paper: https://arxiv.org/abs/2011.09876 - Code: https://github.com/aliyun/DCT-Mask **Incremental Few-Shot Instance Segmentation** - Paper: https://arxiv.org/abs/2105.05312 - Code: https://github.com/danganea/iMTFA **A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation** - Paper: https://arxiv.org/abs/2105.03186 - Code: None **RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features** - Paper: https://arxiv.org/abs/2104.08569 - Code: https://github.com/zhanggang001/RefineMask/ **Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation** - Paper: https://arxiv.org/abs/2104.05239 - Code: https://github.com/tinyalpha/BPR **Multi-Scale Aligned Distillation for Low-Resolution Detection** - Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf - Code: https://github.com/Jia-Research-Lab/MSAD **Boundary IoU: Improving Object-Centric Image Segmentation Evaluation** - Homepage: https://bowenc0221.github.io/boundary-iou/ - Paper: https://arxiv.org/abs/2103.16562 - Code: https://github.com/bowenc0221/boundary-iou-api **Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers** - Paper: https://arxiv.org/abs/2103.12340 - Code: https://github.com/lkeab/BCNet **Zero-shot instance segmentation(Not Sure)** - Paper: None - Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395 ## 视频实例分割 **STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation** - Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm - Code: https://github.com/MinghanLi/STMask **End-to-End Video Instance Segmentation with Transformers** - Paper(Oral): https://arxiv.org/abs/2011.14503 - Code: https://github.com/Epiphqny/VisTR <a name="Panoptic-Segmentation"></a> # 全景分割(Panoptic Segmentation) **ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation** - Paper: https://arxiv.org/abs/2012.05258 - Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab - Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab **Part-aware Panoptic Segmentation** - Paper: https://arxiv.org/abs/2106.06351 - Code: https://github.com/tue-mps/panoptic_parts - Dataset: https://github.com/tue-mps/panoptic_parts **Exemplar-Based Open-Set Panoptic Segmentation Network** - Homepage: https://cv.snu.ac.kr/research/EOPSN/ - Paper: https://arxiv.org/abs/2105.08336 - Code: https://github.com/jd730/EOPSN **MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers** - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html - Code: None **Panoptic Segmentation Forecasting** - Paper: https://arxiv.org/abs/2104.03962 - Code: https://github.com/nianticlabs/panoptic-forecasting **Fully Convolutional Networks for Panoptic Segmentation** - Paper: https://arxiv.org/abs/2012.00720 - Code: https://github.com/yanwei-li/PanopticFCN **Cross-View Regularization for Domain Adaptive Panoptic Segmentation** - Paper: https://arxiv.org/abs/2103.02584 - Code: None <a name="Medical-Image-Segmentation"></a> # 医学图像分割 **1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling** - 作者单位: 腾讯天衍实验室, 北京同仁医院 - Paper(Best Paper Candidate): https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Learning_Calibrated_Medical_Image_Segmentation_via_Multi-Rater_Agreement_Modeling_CVPR_2021_paper.html - Code: https://github.com/jiwei0921/MRNet/ **2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation** - 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Reiss_Every_Annotation_Counts_Multi-Label_Deep_Supervision_for_Medical_Image_Segmentation_CVPR_2021_paper.html - Code: None **3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space** - 作者单位: 香港中文大学, 香港理工大学 - Paper: https://arxiv.org/abs/2103.06030 - Code: https://github.com/liuquande/FedDG-ELCFS **4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation** - 作者单位: 约翰斯·霍普金斯大大学, NVIDIA - Paper(Oral): https://arxiv.org/abs/2103.15954 - Code: None **5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images** - 作者单位: 斯坦福大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html - Code: None <a name="VOS"></a> # 视频目标分割(Video-Object-Segmentation) **Learning Position and Target Consistency for Memory-based Video Object Segmentation** - Paper: https://arxiv.org/abs/2104.04329 - Code: None **SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation** - Paper(Oral): https://arxiv.org/abs/2101.08833 - Code: https://github.com/dukebw/SSTVOS <a name="IVOS"></a> # 交互式视频目标分割(Interactive-Video-Object-Segmentation) **Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion** - Homepage: https://hkchengrex.github.io/MiVOS/ - Paper: https://arxiv.org/abs/2103.07941 - Code: https://github.com/hkchengrex/MiVOS - Demo: https://hkchengrex.github.io/MiVOS/video.html#partb **Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild** - Paper: https://arxiv.org/abs/2103.10391 - Code: https://github.com/svip-lab/IVOS-W <a name="Saliency-Detection"></a> # 显著性检测(Saliency Detection) **Uncertainty-aware Joint Salient Object and Camouflaged Object Detection** - Paper: https://arxiv.org/abs/2104.02628 - Code: https://github.com/JingZhang617/Joint_COD_SOD **Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion** - Paper(Oral): https://arxiv.org/abs/2103.11832 - Code: https://github.com/sunpeng1996/DSA2F <a name="Camouflaged-Object-Detection"></a> # 伪装物体检测(Camouflaged Object Detection) **Uncertainty-aware Joint Salient Object and Camouflaged Object Detection** - Paper: https://arxiv.org/abs/2104.02628 - Code: https://github.com/JingZhang617/Joint_COD_SOD <a name="CoSOD"></a> # 协同显著性检测(Co-Salient Object Detection) **Group Collaborative Learning for Co-Salient Object Detection** - Paper: https://arxiv.org/abs/2104.01108 - Code: https://github.com/fanq15/GCoNet <a name="Matting"></a> # 协同显著性检测(Image Matting) **Semantic Image Matting** - Paper: https://arxiv.org/abs/2104.08201 - Code: https://github.com/nowsyn/SIM - Dataset: https://github.com/nowsyn/SIM <a name="Re-ID"></a> # 行人重识别(Person Re-identification) **Generalizable Person Re-identification with Relevance-aware Mixture of Experts** - Paper: https://arxiv.org/abs/2105.09156 - Code: None **Unsupervised Multi-Source Domain Adaptation for Person Re-Identification** - Paper: https://arxiv.org/abs/2104.12961 - Code: None **Combined Depth Space based Architecture Search For Person Re-identification** - Paper: https://arxiv.org/abs/2104.04163 - Code: None <a name="Person-Search"></a> # 行人搜索(Person Search) **Anchor-Free Person Search** - Paper: https://arxiv.org/abs/2103.11617 - Code: https://github.com/daodaofr/AlignPS - Interpretation: [首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021](https://mp.weixin.qq.com/s/iqJkgp0JBanmeBPyHUkb-A) <a name="Video-Understanding"></a> # 视频理解/行为识别(Video Understanding) **Temporal-Relational CrossTransformers for Few-Shot Action Recognition** - Paper: https://arxiv.org/abs/2101.06184 - Code: https://github.com/tobyperrett/trx **FrameExit: Conditional Early Exiting for Efficient Video Recognition** - Paper(Oral): https://arxiv.org/abs/2104.13400 - Code: None **No frame left behind: Full Video Action Recognition** - Paper: https://arxiv.org/abs/2103.15395 - Code: None **Learning Salient Boundary Feature for Anchor-free Temporal Action Localization** - Paper: https://arxiv.org/abs/2103.13137 - Code: None **Temporal Context Aggregation Network for Temporal Action Proposal Refinement** - Paper: https://arxiv.org/abs/2103.13141 - Code: None - Interpretation: [CVPR 2021 | TCANet:最强时序动作提名修正网络](https://mp.weixin.qq.com/s/UOWMfpTljkyZznHtpkQBhA) **ACTION-Net: Multipath Excitation for Action Recognition** - Paper: https://arxiv.org/abs/2103.07372 - Code: https://github.com/V-Sense/ACTION-Net **Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning** - Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html - Paper: https://arxiv.org/abs/2009.05769 - Code: https://github.com/FingerRec/BE **TDN: Temporal Difference Networks for Efficient Action Recognition** - Paper: https://arxiv.org/abs/2012.10071 - Code: https://github.com/MCG-NJU/TDN <a name="Face-Recognition"></a> # 人脸识别(Face Recognition) **A 3D GAN for Improved Large-pose Facial Recognition** - Paper: https://arxiv.org/abs/2012.10545 - Code: None **MagFace: A Universal Representation for Face Recognition and Quality Assessment** - Paper(Oral): https://arxiv.org/abs/2103.06627 - Code: https://github.com/IrvingMeng/MagFace **WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition** - Homepage: https://www.face-benchmark.org/ - Paper: https://arxiv.org/abs/2103.04098 - Dataset: https://www.face-benchmark.org/ **When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework** - Paper(Oral): https://arxiv.org/abs/2103.01520 - Code: https://github.com/Hzzone/MTLFace - Dataset: https://github.com/Hzzone/MTLFace <a name="Face-Detection"></a> # 人脸检测(Face Detection) **HLA-Face: Joint High-Low Adaptation for Low Light Face Detection** - Homepage: https://daooshee.github.io/HLA-Face-Website/ - Paper: https://arxiv.org/abs/2104.01984 - Code: https://github.com/daooshee/HLA-Face-Code **CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement** - Paper: https://arxiv.org/abs/2103.07017 - Code: None <a name="Face-Anti-Spoofing"></a> # 人脸活体检测(Face Anti-Spoofing) **Cross Modal Focal Loss for RGBD Face Anti-Spoofing** - Paper: https://arxiv.org/abs/2103.00948 - Code: None <a name="Deepfake-Detection"></a> # Deepfake检测(Deepfake Detection) **Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain** - Paper:https://arxiv.org/abs/2103.01856 - Code: None **Multi-attentional Deepfake Detection** - Paper:https://arxiv.org/abs/2103.02406 - Code: None <a name="Age-Estimation"></a> # 人脸年龄估计(Age Estimation) **Continuous Face Aging via Self-estimated Residual Age Embedding** - Paper: https://arxiv.org/abs/2105.00020 - Code: None **PML: Progressive Margin Loss for Long-tailed Age Classification** - Paper: https://arxiv.org/abs/2103.02140 - Code: None <a name="FER"></a> # 人脸表情识别(Facial Expression Recognition) **Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition** - Paper: https://arxiv.org/abs/2103.13372 - Code: None <a name="Deepfakes"></a> # Deepfakes **MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes** - Paper: https://arxiv.org/abs/2103.14211 - Code: None <a name="Human-Parsing"></a> # 人体解析(Human Parsing) **Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing** - Paper: https://arxiv.org/abs/2103.04570 - Code: https://github.com/tfzhou/MG-HumanParsing <a name="Human-Pose-Estimation"></a> # 2D/3D人体姿态估计(2D/3D Human Pose Estimation) ## 2D 人体姿态估计 **ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search** - Paper: ttps://arxiv.org/abs/2105.10154 - Code: None **When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks** - Paper: https://arxiv.org/abs/2105.06152 - Code: None **Pose Recognition with Cascade Transformers** - Paper: https://arxiv.org/abs/2104.06976 - Code: https://github.com/mlpc-ucsd/PRTR **DCPose: Deep Dual Consecutive Network for Human Pose Estimation** - Paper: https://arxiv.org/abs/2103.07254 - Code: https://github.com/Pose-Group/DCPose ## 3D 人体姿态估计 **End-to-End Human Pose and Mesh Reconstruction with Transformers** - Paper: https://arxiv.org/abs/2012.09760 - Code: https://github.com/microsoft/MeshTransformer **PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation** - Paper(Oral): https://arxiv.org/abs/2105.02465 - Code: https://github.com/jfzhang95/PoseAug **Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration** - Paper: https://arxiv.org/abs/2103.02845 - Code: https://github.com/SeanChenxy/HandMesh **Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks** - Paper: https://arxiv.org/abs/2104.01797 - https://github.com/3dpose/3D-Multi-Person-Pose **HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation** - Homepage: https://jeffli.site/HybrIK/ - Paper: https://arxiv.org/abs/2011.14672 - Code: https://github.com/Jeff-sjtu/HybrIK <a name="Animal-Pose-Estimation"></a> # 动物姿态估计(Animal Pose Estimation) **From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation** - Paper: https://arxiv.org/abs/2103.14843 - Code: None <a name="Hand-Pose-Estimation"></a> # 手部姿态估计(Hand Pose Estimation) **Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time** - Homepage: https://stevenlsw.github.io/Semi-Hand-Object/ - Paper: https://arxiv.org/abs/2106.05266 - Code: https://github.com/stevenlsw/Semi-Hand-Object <a name="Human-Volumetric-Capture"></a> # Human Volumetric Capture **POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture** - Homepage: http://www.liuyebin.com/posefusion/posefusion.html - Paper(Oral): https://arxiv.org/abs/2103.15331 - Code: None <a name="Scene-Text-Recognition"></a> # 场景文本检测(Scene Text Detection) **Fourier Contour Embedding for Arbitrary-Shaped Text Detection** - Paper: https://arxiv.org/abs/2104.10442 - Code: None <a name="Scene-Text-Recognition"></a> # 场景文本识别(Scene Text Recognition) **Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition** - Paper: https://arxiv.org/abs/2103.06495 - Code: https://github.com/FangShancheng/ABINet <a name="Image-Compression"></a> # 图像压缩 **Checkerboard Context Model for Efficient Learned Image Compression** - Paper: https://arxiv.org/abs/2103.15306 - Code: None **Slimmable Compressive Autoencoders for Practical Neural Image Compression** - Paper: https://arxiv.org/abs/2103.15726 - Code: None **Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton** - Paper: https://arxiv.org/abs/2103.15368 - Code: None <a name="Model-Compression"></a> # 模型压缩/剪枝/量化 **Teachers Do More Than Teach: Compressing Image-to-Image Models** - Paper: https://arxiv.org/abs/2103.03467 - Code: https://github.com/snap-research/CAT ## 模型剪枝 **Dynamic Slimmable Network** - Paper: https://arxiv.org/abs/2103.13258 - Code: https://github.com/changlin31/DS-Net ## 模型量化 **Network Quantization with Element-wise Gradient Scaling** - Paper: https://arxiv.org/abs/2104.00903 - Code: None **Zero-shot Adversarial Quantization** - Paper(Oral): https://arxiv.org/abs/2103.15263 - Code: https://git.io/Jqc0y **Learnable Companding Quantization for Accurate Low-bit Neural Networks** - Paper: https://arxiv.org/abs/2103.07156 - Code: None <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) **Distilling Knowledge via Knowledge Review** - Paper: https://arxiv.org/abs/2104.09044 - Code: https://github.com/Jia-Research-Lab/ReviewKD **Distilling Object Detectors via Decoupled Features** - Paper: https://arxiv.org/abs/2103.14475 - Code: https://github.com/ggjy/DeFeat.pytorch <a name="Super-Resolution"></a> # 超分辨率(Super-Resolution) **Image Super-Resolution with Non-Local Sparse Attention** - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Mei_Image_Super-Resolution_With_Non-Local_Sparse_Attention_CVPR_2021_paper.pdf - Code: https://github.com/HarukiYqM/Non-Local-Sparse-Attention **Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline** - Homepage: http://mepro.bjtu.edu.cn/resource.html - Paper: https://arxiv.org/abs/2104.06174 - Code: None **ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic** - Paper: https://arxiv.org/abs/2103.04039 - Code: https://github.com/Xiangtaokong/ClassSR **AdderSR: Towards Energy Efficient Image Super-Resolution** - Paper: https://arxiv.org/abs/2009.08891 - Code: None <a name="Dehazing"></a> # 去雾(Dehazing) **Contrastive Learning for Compact Single Image Dehazing** - Paper: https://arxiv.org/abs/2104.09367 - Code: https://github.com/GlassyWu/AECR-Net ## 视频超分辨率 **Temporal Modulation Network for Controllable Space-Time Video Super-Resolution** - Paper: None - Code: https://github.com/CS-GangXu/TMNet <a name="Image-Restoration"></a> # 图像恢复(Image Restoration) **Multi-Stage Progressive Image Restoration** - Paper: https://arxiv.org/abs/2102.02808 - Code: https://github.com/swz30/MPRNet <a name="Image-Inpainting"></a> # 图像补全(Image Inpainting) **PD-GAN: Probabilistic Diverse GAN for Image Inpainting** - Paper: https://arxiv.org/abs/2105.02201 - Code: https://github.com/KumapowerLIU/PD-GAN **TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations** - Homepage: https://yzhouas.github.io/projects/TransFill/index.html - Paper: https://arxiv.org/abs/2103.15982 - Code: None <a name="Image-Editing"></a> # 图像编辑(Image Editing) **StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: https://arxiv.org/abs/2104.14754 - Code: https://github.com/naver-ai/StyleMapGAN - Demo Video: https://youtu.be/qCapNyRA_Ng **High-Fidelity and Arbitrary Face Editing** - Paper: https://arxiv.org/abs/2103.15814 - Code: None **Anycost GANs for Interactive Image Synthesis and Editing** - Paper: https://arxiv.org/abs/2103.03243 - Code: https://github.com/mit-han-lab/anycost-gan **PISE: Person Image Synthesis and Editing with Decoupled GAN** - Paper: https://arxiv.org/abs/2103.04023 - Code: https://github.com/Zhangjinso/PISE **DeFLOCNet: Deep Image Editing via Flexible Low-level Controls** - Paper: http://raywzy.com/ - Code: http://raywzy.com/ **Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: None - Code: None <a name="Image-Captioning"></a> # 图像描述(Image Captioning) **Towards Accurate Text-based Image Captioning with Content Diversity Exploration** - Paper: https://arxiv.org/abs/2105.03236 - Code: None <a name="Font-Generation"></a> # 字体生成(Font Generation) **DG-Font: Deformable Generative Networks for Unsupervised Font Generation** - Paper: https://arxiv.org/abs/2104.03064 - Code: https://github.com/ecnuycxie/DG-Font <a name="Image-Matching"></a> # 图像匹配(Image Matcing) **LoFTR: Detector-Free Local Feature Matching with Transformers** - Homepage: https://zju3dv.github.io/loftr/ - Paper: https://arxiv.org/abs/2104.00680 - Code: https://github.com/zju3dv/LoFTR **Convolutional Hough Matching Networks** - Homapage: http://cvlab.postech.ac.kr/research/CHM/ - Paper(Oral): https://arxiv.org/abs/2103.16831 - Code: None <a name="Image-Blending"></a> # 图像融合(Image Blending) **Bridging the Visual Gap: Wide-Range Image Blending** - Paper: https://arxiv.org/abs/2103.15149 - Code: https://github.com/julia0607/Wide-Range-Image-Blending <a name="Reflection-Removal"></a> # 反光去除(Reflection Removal) **Robust Reflection Removal with Reflection-free Flash-only Cues** - Paper: https://arxiv.org/abs/2103.04273 - Code: https://github.com/ChenyangLEI/flash-reflection-removal <a name="3D-C"></a> # 3D点云分类(3D Point Clouds Classification) **Equivariant Point Network for 3D Point Cloud Analysis** - Paper: https://arxiv.org/abs/2103.14147 - Code: None **PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds** - Paper: https://arxiv.org/abs/2103.14635 - Code: https://github.com/CVMI-Lab/PAConv <a name="3D-Object-Detection"></a> # 3D目标检测(3D Object Detection) **3D-MAN: 3D Multi-frame Attention Network for Object Detection** - Paper: https://arxiv.org/abs/2103.16054 - Code: None **Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds** - Paper: https://arxiv.org/abs/2104.06114 - Code: https://github.com/cheng052/BRNet **HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection** - Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/ - Paper: https://arxiv.org/abs/2104.00902 - Code: https://github.com/cvlab-yonsei/HVPR **LiDAR R-CNN: An Efficient and Universal 3D Object Detector** - Paper: https://arxiv.org/abs/2103.15297 - Code: https://github.com/tusimple/LiDAR_RCNN **M3DSSD: Monocular 3D Single Stage Object Detector** - Paper: https://arxiv.org/abs/2103.13164 - Code: https://github.com/mumianyuxin/M3DSSD **SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud** - Paper: None - Code: https://github.com/Vegeta2020/SE-SSD **Center-based 3D Object Detection and Tracking** - Paper: https://arxiv.org/abs/2006.11275 - Code: https://github.com/tianweiy/CenterPoint **Categorical Depth Distribution Network for Monocular 3D Object Detection** - Paper: https://arxiv.org/abs/2103.01100 - Code: None <a name="3D-Semantic-Segmentation"></a> # 3D语义分割(3D Semantic Segmentation) **Bidirectional Projection Network for Cross Dimension Scene Understanding** - Paper(Oral): https://arxiv.org/abs/2103.14326 - Code: https://github.com/wbhu/BPNet **Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion** - Paper: https://arxiv.org/abs/2103.07074 - Code: https://github.com/ShiQiu0419/BAAF-Net **Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation** - Paper: https://arxiv.org/abs/2011.10033 - Code: https://github.com/xinge008/Cylinder3D **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges** - Homepage: https://github.com/QingyongHu/SensatUrban - Paper: http://arxiv.org/abs/2009.03137 - Code: https://github.com/QingyongHu/SensatUrban - Dataset: https://github.com/QingyongHu/SensatUrban <a name="3D-Panoptic-Segmentation"></a> # 3D全景分割(3D Panoptic Segmentation) **Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation** - Paper: https://arxiv.org/abs/2103.14962 - Code: https://github.com/edwardzhou130/Panoptic-PolarNet <a name="3D-Object-Tracking"></a> # 3D目标跟踪(3D Object Trancking) **Center-based 3D Object Detection and Tracking** - Paper: https://arxiv.org/abs/2006.11275 - Code: https://github.com/tianweiy/CenterPoint <a name="3D-PointCloud-Registration"></a> # 3D点云配准(3D Point Cloud Registration) **ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning** - Paper: https://arxiv.org/abs/2103.15231 - Code: None **PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency** - Paper: https://arxiv.org/abs/2103.05465 - Code: https://github.com/XuyangBai/PointDSC **PREDATOR: Registration of 3D Point Clouds with Low Overlap** - Paper: https://arxiv.org/abs/2011.13005 - Code: https://github.com/ShengyuH/OverlapPredator <a name="3D-Point-Cloud-Completion"></a> # 3D点云补全(3D Point Cloud Completion) **Unsupervised 3D Shape Completion through GAN Inversion** - Homepage: https://junzhezhang.github.io/projects/ShapeInversion/ - Paper: https://arxiv.org/abs/2104.13366 - Code: https://github.com/junzhezhang/shape-inversion **Variational Relational Point Completion Network** - Homepage: https://paul007pl.github.io/projects/VRCNet - Paper: https://arxiv.org/abs/2104.10154 - Code: https://github.com/paul007pl/VRCNet **Style-based Point Generator with Adversarial Rendering for Point Cloud Completion** - Homepage: https://alphapav.github.io/SpareNet/ - Paper: https://arxiv.org/abs/2103.02535 - Code: https://github.com/microsoft/SpareNet <a name="3D-Reconstruction"></a> # 3D重建(3D Reconstruction) **Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection** - Paper: http://arxiv.org/abs/2106.07852 - Code: https://github.com/TencentYoutuResearch/3DFaceReconstruction-LAP **Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction** - Paper: https://arxiv.org/abs/2104.00858 - Code: None **NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video** - Homepage: https://zju3dv.github.io/neuralrecon/ - Paper(Oral): https://arxiv.org/abs/2104.00681 - Code: https://github.com/zju3dv/NeuralRecon <a name="6D-Pose-Estimation"></a> # 6D位姿估计(6D Pose Estimation) **FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism** - Paper(Oral): https://arxiv.org/abs/2103.07054 - Code: https://github.com/DC1991/FS-Net **GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation** - Paper: http://arxiv.org/abs/2102.12145 - code: https://git.io/GDR-Net **FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation** - Paper: https://arxiv.org/abs/2103.02242 - Code: https://github.com/ethnhe/FFB6D <a name="Camera-Pose-Estimation"></a> # 相机姿态估计 **Back to the Feature: Learning Robust Camera Localization from Pixels to Pose** - Paper: https://arxiv.org/abs/2103.09213 - Code: https://github.com/cvg/pixloc <a name="Depth-Estimation"></a> # 深度估计(Depth Estimation) **S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation** - Paper(Oral): https://arxiv.org/abs/2104.00877 - Code: None **Beyond Image to Depth: Improving Depth Prediction using Echoes** - Homepage: https://krantiparida.github.io/projects/bimgdepth.html - Paper: https://arxiv.org/abs/2103.08468 - Code: https://github.com/krantiparida/beyond-image-to-depth **S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation** - Paper: https://arxiv.org/abs/2103.02396 - Code: None **Depth from Camera Motion and Object Detection** - Paper: https://arxiv.org/abs/2103.01468 - Code: https://github.com/griffbr/ODMD - Dataset: https://github.com/griffbr/ODMD <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) **A Decomposition Model for Stereo Matching** - Paper: https://arxiv.org/abs/2104.07516 - Code: None <a name="Flow-Estimation"></a> # 光流估计(Flow Estimation) **Self-Supervised Multi-Frame Monocular Scene Flow** - Paper: https://arxiv.org/abs/2105.02216 - Code: https://github.com/visinf/multi-mono-sf **RAFT-3D: Scene Flow using Rigid-Motion Embeddings** - Paper: https://arxiv.org/abs/2012.00726v1 - Code: None **Learning Optical Flow From Still Images** - Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/ - Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf - Code: https://github.com/mattpoggi/depthstillation **FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds** - Paper: https://arxiv.org/abs/2104.00798 - Code: None <a name="Lane-Detection"></a> # 车道线检测(Lane Detection) **Focus on Local: Detecting Lane Marker from Bottom Up via Key Point** - Paper: https://arxiv.org/abs/2105.13680 - Code: None **Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection** - Paper: https://arxiv.org/abs/2010.12035 - Code: https://github.com/lucastabelini/LaneATT <a name="Trajectory-Prediction"></a> # 轨迹预测(Trajectory Prediction) **Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction** - Paper(Oral): https://arxiv.org/abs/2104.08277 - Code: None <a name="Crowd-Counting"></a> # 人群计数(Crowd Counting) **Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark** - Paper: https://arxiv.org/abs/2105.02440 - Code: https://github.com/VisDrone/DroneCrowd - Dataset: https://github.com/VisDrone/DroneCrowd <a name="AE"></a> # 对抗样本(Adversarial Examples) **Enhancing the Transferability of Adversarial Attacks through Variance Tuning** - Paper: https://arxiv.org/abs/2103.15571 - Code: https://github.com/JHL-HUST/VT **LiBRe: A Practical Bayesian Approach to Adversarial Detection** - Paper: https://arxiv.org/abs/2103.14835 - Code: None **Natural Adversarial Examples** - Paper: https://arxiv.org/abs/1907.07174 - Code: https://github.com/hendrycks/natural-adv-examples <a name="Image-Retrieval"></a> # 图像检索(Image Retrieval) **StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval** - Paper: https://arxiv.org/abs/2103.15706 - COde: None **QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval** - Paper: https://arxiv.org/abs/2103.02927 - Code: None <a name="Video-Retrieval"></a> # 视频检索(Video Retrieval) **On Semantic Similarity in Video Retrieval** - Paper: https://arxiv.org/abs/2103.10095 - Homepage: https://mwray.github.io/SSVR/ - Code: https://github.com/mwray/Semantic-Video-Retrieval <a name="Cross-modal-Retrieval"></a> # 跨模态检索(Cross-modal Retrieval) **Cross-Modal Center Loss for 3D Cross-Modal Retrieval** - Paper: https://arxiv.org/abs/2008.03561 - Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss **Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers** - Paper: https://arxiv.org/abs/2103.16553 - Code: None **Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning** - Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning - Code: https://github.com/amzn/image-to-recipe-transformers <a name="Zero-Shot-Learning"></a> # Zero-Shot Learning **Counterfactual Zero-Shot and Open-Set Visual Recognition** - Paper: https://arxiv.org/abs/2103.00887 - Code: https://github.com/yue-zhongqi/gcm-cf <a name="Federated-Learning"></a> # 联邦学习(Federated Learning) **FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space** - Paper: https://arxiv.org/abs/2103.06030 - Code: https://github.com/liuquande/FedDG-ELCFS <a name="Video-Frame-Interpolation"></a> # 视频插帧(Video Frame Interpolation) **CDFI: Compression-Driven Network Design for Frame Interpolation** - Paper: None - Code: https://github.com/tding1/CDFI **FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation** - Homepage: https://tarun005.github.io/FLAVR/ - Paper: https://arxiv.org/abs/2012.08512 - Code: https://github.com/tarun005/FLAVR <a name="Visual-Reasoning"></a> # 视觉推理(Visual Reasoning) **Transformation Driven Visual Reasoning** - homepage: https://hongxin2019.github.io/TVR/ - Paper: https://arxiv.org/abs/2011.13160 - Code: https://github.com/hughplay/TVR <a name="Image-Synthesis"></a> # 图像合成(Image Synthesis) **GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields** - Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html - Paper(Oral): https://arxiv.org/abs/2011.12100 - Code: https://github.com/autonomousvision/giraffe - Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1 **Taming Transformers for High-Resolution Image Synthesis** - Homepage: https://compvis.github.io/taming-transformers/ - Paper(Oral): https://arxiv.org/abs/2012.09841 - Code: https://github.com/CompVis/taming-transformers <a name="Visual-Synthesis"></a> # 视图合成(View Synthesis) **Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes** - Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/ - Paper: https://arxiv.org/abs/2104.06935 **Self-Supervised Visibility Learning for Novel View Synthesis** - Paper: https://arxiv.org/abs/2103.15407 - Code: None **NeX: Real-time View Synthesis with Neural Basis Expansion** - Homepage: https://nex-mpi.github.io/ - Paper(Oral): https://arxiv.org/abs/2103.05606 <a name="Style-Transfer"></a> # 风格迁移(Style Transfer) **Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer** - Paper: https://arxiv.org/abs/2104.05376 - Code: https://github.com/PaddlePaddle/PaddleGAN/ <a name="Layout-Generation"></a> # 布局生成(Layout Generation) **LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity** - Paper: None - Code: None **Variational Transformer Networks for Layout Generation** - Paper: https://arxiv.org/abs/2104.02416 - Code: None <a name="Domain-Generalization"></a> # Domain Generalization **Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections** - Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/papers/Pandey_Generalization_on_Unseen_Domains_via_Inference-Time_Label-Preserving_Target_Projections_CVPR_2021_paper.pdf - Code: https://github.com/VSumanth99/InferenceTimeDG **Generalizable Person Re-identification with Relevance-aware Mixture of Experts** - Paper: https://arxiv.org/abs/2105.09156 - Code: None **RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening** - Paper: https://arxiv.org/abs/2103.15597 - Code: https://github.com/shachoi/RobustNet **Adaptive Methods for Real-World Domain Generalization** - Paper: https://arxiv.org/abs/2103.15796 - Code: None **FSDR: Frequency Space Domain Randomization for Domain Generalization** - Paper: https://arxiv.org/abs/2103.02370 - Code: None <a name="Domain-Adaptation"></a> # Domain Adaptation **Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation** - Paper: https://arxiv.org/abs/2104.00808 - Code: None **Domain Consensus Clustering for Universal Domain Adaptation** - Paper: http://reler.net/papers/guangrui_cvpr2021.pdf - Code: https://github.com/Solacex/Domain-Consensus-Clustering <a name="Open-Set"></a> # Open-Set **Towards Open World Object Detection** - Paper(Oral): https://arxiv.org/abs/2103.02603 - Code: https://github.com/JosephKJ/OWOD **Exemplar-Based Open-Set Panoptic Segmentation Network** - Homepage: https://cv.snu.ac.kr/research/EOPSN/ - Paper: https://arxiv.org/abs/2105.08336 - Code: https://github.com/jd730/EOPSN **Learning Placeholders for Open-Set Recognition** - Paper(Oral): https://arxiv.org/abs/2103.15086 - Code: None <a name="Adversarial-Attack"></a> # Adversarial Attack **IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking** - Paper: https://arxiv.org/abs/2103.14938 - Code: https://github.com/VISION-SJTU/IoUattack <a name="HOI"></a> # "人-物"交互(HOI)检测 **HOTR: End-to-End Human-Object Interaction Detection with Transformers** - Paper: https://arxiv.org/abs/2104.13682 - Code: None **Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information** - Paper: https://arxiv.org/abs/2103.05399 - Code: https://github.com/hitachi-rd-cv/qpic **Reformulating HOI Detection as Adaptive Set Prediction** - Paper: https://arxiv.org/abs/2103.05983 - Code: https://github.com/yoyomimi/AS-Net **Detecting Human-Object Interaction via Fabricated Compositional Learning** - Paper: https://arxiv.org/abs/2103.08214 - Code: https://github.com/zhihou7/FCL **End-to-End Human Object Interaction Detection with HOI Transformer** - Paper: https://arxiv.org/abs/2103.04503 - Code: https://github.com/bbepoch/HoiTransformer <a name="Shadow-Removal"></a> # 阴影去除(Shadow Removal) **Auto-Exposure Fusion for Single-Image Shadow Removal** - Paper: https://arxiv.org/abs/2103.01255 - Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal <a name="Virtual-Try-On"></a> # 虚拟换衣(Virtual Try-On) **Parser-Free Virtual Try-on via Distilling Appearance Flows** **基于外观流蒸馏的无需人体解析的虚拟换装** - Paper: https://arxiv.org/abs/2103.04559 - Code: https://github.com/geyuying/PF-AFN <a name="Label-Noise"></a> # 标签噪声(Label Noise) **A Second-Order Approach to Learning with Instance-Dependent Label Noise** - Paper(Oral): https://arxiv.org/abs/2012.11854 - Code: https://github.com/UCSC-REAL/CAL <a name="Video-Stabilization"></a> # 视频稳像(Video Stabilization) **Real-Time Selfie Video Stabilization** - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf - Code: https://github.com/jiy173/selfievideostabilization <a name="Datasets"></a> # 数据集(Datasets) **Tracking Pedestrian Heads in Dense Crowd** - Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html - Code1: https://github.com/Sentient07/HeadHunter - Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T - Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ **Part-aware Panoptic Segmentation** - Paper: https://arxiv.org/abs/2106.06351 - Code: https://github.com/tue-mps/panoptic_parts - Dataset: https://github.com/tue-mps/panoptic_parts **Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos** - Homepage: https://www.yasamin.page/hdnet_tiktok - Paper(Oral): https://arxiv.org/abs/2103.03319 - Code: https://github.com/yasaminjafarian/HDNet_TikTok - Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v **High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network** - Paper: https://arxiv.org/abs/2105.09188 - Code: https://github.com/csjliang/LPTN - Dataset: https://github.com/csjliang/LPTN **Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark** - Paper: https://arxiv.org/abs/2105.02440 - Code: https://github.com/VisDrone/DroneCrowd - Dataset: https://github.com/VisDrone/DroneCrowd **Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets** - Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/ - Paper(Oral): https://arxiv.org/abs/2104.12690 - Code: https://github.com/fidler-lab/efficient-annotation-cookbook 论文下载链接: **ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation** - Paper: https://arxiv.org/abs/2012.05258 - Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab - Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab **Learning To Count Everything** - Paper: https://arxiv.org/abs/2104.08391 - Code: https://github.com/cvlab-stonybrook/LearningToCountEverything - Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything **Semantic Image Matting** - Paper: https://arxiv.org/abs/2104.08201 - Code: https://github.com/nowsyn/SIM - Dataset: https://github.com/nowsyn/SIM **Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline** - Homepage: http://mepro.bjtu.edu.cn/resource.html - Paper: https://arxiv.org/abs/2104.06174 - Code: None **Visual Semantic Role Labeling for Video Understanding** - Homepage: https://vidsitu.org/ - Paper: https://arxiv.org/abs/2104.00990 - Code: https://github.com/TheShadow29/VidSitu - Dataset: https://github.com/TheShadow29/VidSitu **VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild** - Homepage: https://www.vspwdataset.com/ - Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf - GitHub: https://github.com/sssdddwww2/vspw_dataset_download **Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark** - Homepage: https://vap.aau.dk/sewer-ml/ - Paper: https://arxiv.org/abs/2103.10619 **Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark** - Homepage: https://vap.aau.dk/sewer-ml/ - Paper: https://arxiv.org/abs/2103.10895 **Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food** - Paper: https://arxiv.org/abs/2103.03375 - Dataset: None **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges** - Homepage: https://github.com/QingyongHu/SensatUrban - Paper: http://arxiv.org/abs/2009.03137 - Code: https://github.com/QingyongHu/SensatUrban - Dataset: https://github.com/QingyongHu/SensatUrban **When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework** - Paper(Oral): https://arxiv.org/abs/2103.01520 - Code: https://github.com/Hzzone/MTLFace - Dataset: https://github.com/Hzzone/MTLFace **Depth from Camera Motion and Object Detection** - Paper: https://arxiv.org/abs/2103.01468 - Code: https://github.com/griffbr/ODMD - Dataset: https://github.com/griffbr/ODMD **There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge** - Homepage: http://rl.uni-freiburg.de/research/multimodal-distill - Paper: https://arxiv.org/abs/2103.01353 - Code: http://rl.uni-freiburg.de/research/multimodal-distill **Scan2Cap: Context-aware Dense Captioning in RGB-D Scans** - Paper: https://arxiv.org/abs/2012.02206 - Code: https://github.com/daveredrum/Scan2Cap - Dataset: https://github.com/daveredrum/ScanRefer **There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge** - Paper: https://arxiv.org/abs/2103.01353 - Code: http://rl.uni-freiburg.de/research/multimodal-distill - Dataset: http://rl.uni-freiburg.de/research/multimodal-distill <a name="Others"></a> # 其他(Others) **Fast and Accurate Model Scaling** - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html - Code: https://github.com/facebookresearch/pycls **Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos** - Homepage: https://www.yasamin.page/hdnet_tiktok - Paper(Oral): https://arxiv.org/abs/2103.03319 - Code: https://github.com/yasaminjafarian/HDNet_TikTok - Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v **Omnimatte: Associating Objects and Their Effects in Video** - Homepage: https://omnimatte.github.io/ - Paper(Oral): https://arxiv.org/abs/2105.06993 - Code: https://omnimatte.github.io/#code **Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets** - Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/ - Paper(Oral): https://arxiv.org/abs/2104.12690 - Code: https://github.com/fidler-lab/efficient-annotation-cookbook **Motion Representations for Articulated Animation** - Paper: https://arxiv.org/abs/2104.11280 - Code: https://github.com/snap-research/articulated-animation **Deep Lucas-Kanade Homography for Multimodal Image Alignment** - Paper: https://arxiv.org/abs/2104.11693 - Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography **Skip-Convolutions for Efficient Video Processing** - Paper: https://arxiv.org/abs/2104.11487 - Code: None **KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control** - Homepage: http://tomasjakab.github.io/KeypointDeformer - Paper(Oral): https://arxiv.org/abs/2104.11224 - Code: https://github.com/tomasjakab/keypoint_deformer/ **Learning To Count Everything** - Paper: https://arxiv.org/abs/2104.08391 - Code: https://github.com/cvlab-stonybrook/LearningToCountEverything - Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything **SOLD2: Self-supervised Occlusion-aware Line Description and Detection** - Paper(Oral): https://arxiv.org/abs/2104.03362 - Code: https://github.com/cvg/SOLD2 **Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression** - Homepage: https://li-wanhua.github.io/POEs/ - Paper: https://arxiv.org/abs/2103.13629 - Code: https://github.com/Li-Wanhua/POEs **LEAP: Learning Articulated Occupancy of People** - Paper: https://arxiv.org/abs/2104.06849 - Code: None **Visual Semantic Role Labeling for Video Understanding** - Homepage: https://vidsitu.org/ - Paper: https://arxiv.org/abs/2104.00990 - Code: https://github.com/TheShadow29/VidSitu - Dataset: https://github.com/TheShadow29/VidSitu **UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles** - Paper: https://arxiv.org/abs/2104.00946 - Code: https://github.com/SUTDCV/UAV-Human **Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning** - Paper(Oral): https://arxiv.org/abs/2104.00924 - Code: None **Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction** - Paper: https://arxiv.org/abs/2104.00858 - Code: None **Towards High Fidelity Face Relighting with Realistic Shadows** - Paper: https://arxiv.org/abs/2104.00825 - Code: None **BRepNet: A topological message passing system for solid models** - Paper(Oral): https://arxiv.org/abs/2104.00706 - Code: None **Visually Informed Binaural Audio Generation without Binaural Audios** - Homepage: https://sheldontsui.github.io/projects/PseudoBinaural - Paper: None - GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021 - Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc **Exploring intermediate representation for monocular vehicle pose estimation** - Paper: None - Code: https://github.com/Nicholasli1995/EgoNet **Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB** - Paper(Oral): https://arxiv.org/abs/2103.14708 - Code: None **Invertible Image Signal Processing** - Paper: https://arxiv.org/abs/2103.15061 - Code: https://github.com/yzxing87/Invertible-ISP **Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling** - Paper: https://arxiv.org/abs/2103.14858 - Code: None **SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences** - Paper: https://arxiv.org/abs/2103.14898 - Code: None **Embedding Transfer with Label Relaxation for Improved Metric Learning** - Paper: https://arxiv.org/abs/2103.14908 - Code: None **Picasso: A CUDA-based Library for Deep Learning over 3D Meshes** - Paper: https://arxiv.org/abs/2103.15076 - Code: https://github.com/hlei-ziyan/Picasso **Meta-Mining Discriminative Samples for Kinship Verification** - Paper: https://arxiv.org/abs/2103.15108 - Code: None **Cloud2Curve: Generation and Vectorization of Parametric Sketches** - Paper: https://arxiv.org/abs/2103.15536 - Code: None **TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events** - Paper: https://arxiv.org/abs/2103.15538 - Code: https://github.com/SUTDCV/SUTD-TrafficQA **Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution** - Homepage: http://wellyzhang.github.io/project/prae.html - Paper: https://arxiv.org/abs/2103.14230 - Code: None **ACRE: Abstract Causal REasoning Beyond Covariation** - Homepage: http://wellyzhang.github.io/project/acre.html - Paper: https://arxiv.org/abs/2103.14232 - Code: None **Confluent Vessel Trees with Accurate Bifurcations** - Paper: https://arxiv.org/abs/2103.14268 - Code: None **Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling** - Paper: https://arxiv.org/abs/2103.14338 - Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer **Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks** - Homepage: https://paschalidoud.github.io/neural_parts - Paper: None - Code: https://github.com/paschalidoud/neural_parts **Knowledge Evolution in Neural Networks** - Paper(Oral): https://arxiv.org/abs/2103.05152 - Code: https://github.com/ahmdtaha/knowledge_evolution **Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning** - Paper: https://arxiv.org/abs/2103.02148 - Code: https://github.com/guopengf/FLMRCM **SGP: Self-supervised Geometric Perception** - Oral - Paper: https://arxiv.org/abs/2103.03114 - Code: https://github.com/theNded/SGP **Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning** - Paper: https://arxiv.org/abs/2103.02148 - Code: https://github.com/guopengf/FLMRCM **Diffusion Probabilistic Models for 3D Point Cloud Generation** - Paper: https://arxiv.org/abs/2103.01458 - Code: https://github.com/luost26/diffusion-point-cloud **Scan2Cap: Context-aware Dense Captioning in RGB-D Scans** - Paper: https://arxiv.org/abs/2012.02206 - Code: https://github.com/daveredrum/Scan2Cap - Dataset: https://github.com/daveredrum/ScanRefer **There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge** - Paper: https://arxiv.org/abs/2103.01353 - Code: http://rl.uni-freiburg.de/research/multimodal-distill - Dataset: http://rl.uni-freiburg.de/research/multimodal-distill <a name="TO-DO"></a> # 待添加(TODO) - [重磅!腾讯优图20篇论文入选CVPR 2021](https://mp.weixin.qq.com/s/McAtOVh0osWZ3uppEoHC8A) - [MePro团队三篇论文被CVPR 2021接收](https://mp.weixin.qq.com/s/GD5Zb6u_MQ8GZIAGeCGo3Q) <a name="Not-Sure"></a> # 不确定中没中(Not Sure) **CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models** - Paper: none - Code: https://github.com/transcendentsky/Film-Recovery **Toward Explainable Reflection Removal with Distilling and Model Uncertainty** - Paper: none - Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty **DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation** - Paper: none - Code: https://github.com/lhaippp/DeepOIS **Exploring Adversarial Fake Images on Face Manifold** - Paper: none - Code: https://github.com/ldz666666/Style-atk **Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task** - Paper: none - Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task **Temporal Contrastive Graph for Self-supervised Video Representation Learning** - Paper: none - Code: https://github.com/YangLiu9208/TCG **Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching** - Paper: none - Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr **Fast and Memory-Efficient Compact Bilinear Pooling** - Paper: none - Code: https://github.com/cvpr2021kp2/cvpr2021kp2 **Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine** - Paper: none - Code: https://github.com/gapDetection/cvpr2021 **Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation** - Paper: none - Code: https://github.com/interactivekeypoint2020/Morph https://github.com/ShaoQiangShen/CVPR2021 https://github.com/gillesflash/CVPR2021 https://github.com/anonymous-submission1991/BaLeNAS https://github.com/cvpr2021dcb/cvpr2021dcb https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578 https://github.com/AldrichZeng/FreqPrune https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM https://github.com/ddfss/datadrive-fss ================================================ FILE: CVPR2022-Papers-with-Code.md ================================================ # CVPR 2022 论文和开源项目合集(Papers with Code) [CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)! CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view > 注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2019](CVPR2019-Papers-with-Code.md) > - [CVPR 2020](CVPR2020-Papers-with-Code.md) > - [CVPR 2021](CVPR2021-Papers-with-Code.md) 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~  ## 【CVPR 2022 论文开源目录】 - [Backbone](#Backbone) - [CLIP](#CLIP) - [GAN](#GAN) - [GNN](#GNN) - [MLP](#MLP) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [3D Face](#3D Face) - [长尾分布(Long-Tail)](#Long-Tail) - [Visual Transformer](#Visual-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [知识蒸馏(Knowledge Distillation)](#KD) - [目标检测(Object Detection)](#Object-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [小样本分类(Few-Shot Classification)](#FFC) - [小样本分割(Few-Shot Segmentation)](#FFS) - [图像抠图(Image Matting)](#Matting) - [视频理解(Video Understanding)](#VU) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#Super-Resolution) - [去模糊(Deblur)](#Deblur) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3D-Object-Detection) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D重建(3D Reconstruction)](#3D-R) - [行人重识别(Person Re-identification)](#ReID) - [伪装物体检测(Camouflaged Object Detection)](#COD) - [深度估计(Depth Estimation)](#Depth-Estimation) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#FM) - [车道线检测(Lane Detection)](#Lane-Detection) - [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation) - [图像修复(Image Inpainting)](#Image-Inpainting) - [图像检索(Image Retrieval)](#Image-Retrieval) - [人脸识别(Face Recognition)](#Face-Recognition) - [人群计数(Crowd Counting)](#Crowd-Counting) - [医学图像(Medical Image)](#Medical-Image) - [视频生成(Video Generation)](#Video Generation) - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation) - [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS) - [步态识别(Gait Recognition)](#GR) - [风格迁移(Style Transfer)](#ST) - [异常检测(Anomaly Detection](#AD) - [对抗样本(Adversarial Examples)](#AE) - [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL) - [雷达目标检测(Radar Object Detection)](#ROD) - [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI) - [图像拼接(Image Stitching)](#Image-Stitching) - [水印(Watermarking)](#Watermarking) - [Action Counting](#AC) - [Grounded Situation Recognition](#GSR) - [Zero-shot Learning](#ZSL) - [DeepFakes](#DeepFakes) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="Backbone"></a> # Backbone **A ConvNet for the 2020s** - Paper: https://arxiv.org/abs/2201.03545 - Code: https://github.com/facebookresearch/ConvNeXt - 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw **Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs** - Paper: https://arxiv.org/abs/2203.06717 - Code: https://github.com/megvii-research/RepLKNet - Code2: https://github.com/DingXiaoH/RepLKNet-pytorch - 中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg **MPViT : Multi-Path Vision Transformer for Dense Prediction** - Paper: https://arxiv.org/abs/2112.11010 - Code: https://github.com/youngwanLEE/MPViT - 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg **Mobile-Former: Bridging MobileNet and Transformer** - Paper: https://arxiv.org/abs/2108.05895 - Code: None - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ **MetaFormer is Actually What You Need for Vision** - Paper: https://arxiv.org/abs/2111.11418 - Code: https://github.com/sail-sg/poolformer **Shunted Self-Attention via Multi-Scale Token Aggregation** - Paper(Oral): https://arxiv.org/abs/2111.15193 - Code: https://github.com/OliverRensu/Shunted-Transformer **TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing** - Paper: http://arxiv.org/abs/2203.10489 - Code: https://github.com/JierunChen/TVConv **Learned Queries for Efficient Local Attention** - Paper(Oral): https://arxiv.org/abs/2112.11435 - Code: https://github.com/moabarar/qna **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality** - Paper: https://arxiv.org/abs/2112.11081 - Code: https://github.com/DingXiaoH/RepMLP <a name="CLIP"></a> # CLIP **HairCLIP: Design Your Hair by Text and Reference Image** - Paper: https://arxiv.org/abs/2112.05142 - Code: https://github.com/wty-ustc/HairCLIP **PointCLIP: Point Cloud Understanding by CLIP** - Paper: https://arxiv.org/abs/2112.02413 - Code: https://github.com/ZrrSkywalker/PointCLIP **Blended Diffusion for Text-driven Editing of Natural Images** - Paper: https://arxiv.org/abs/2111.14818 - Code: https://github.com/omriav/blended-diffusion <a name="GAN"></a> # GAN **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing** - Homepage: https://semanticstylegan.github.io/ - Paper: https://arxiv.org/abs/2112.02236 - Demo: https://semanticstylegan.github.io/videos/demo.mp4 **Style Transformer for Image Inversion and Editing** - Paper: https://arxiv.org/abs/2203.07932 - Code: https://github.com/sapphire497/style-transformer **Unsupervised Image-to-Image Translation with Generative Prior** - Homepage: https://www.mmlab-ntu.com/project/gpunit/ - Paper: https://arxiv.org/abs/2204.03641 - Code: https://github.com/williamyang1991/GP-UNIT **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2** - Homepage: https://universome.github.io/stylegan-v - Paper: https://arxiv.org/abs/2112.14683 - Code: https://github.com/universome/stylegan-v **OSSGAN: Open-set Semi-supervised Image Generation** - Paper: https://arxiv.org/abs/2204.14249 - Code: https://github.com/raven38/OSSGAN **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis** - Paper: https://arxiv.org/abs/2204.06160 - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution <a name="GNN"></a> # GNN **OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks** - Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf - Code: https://github.com/WanyuGroup/CVPR2022-OrphicX <a name="MLP"></a> # MLP **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality** - Paper: https://arxiv.org/abs/2112.11081 - Code: https://github.com/DingXiaoH/RepMLP <a name="NAS"></a> # NAS **β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search** - Paper: https://arxiv.org/abs/2203.01665 - Code: https://github.com/Sunshine-Ye/Beta-DARTS **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior** - Paper: https://arxiv.org/abs/2111.15362 - Code: None <a name="OCR"></a> # OCR **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition** - Paper: https://arxiv.org/abs/2203.10209 - Code: https://github.com/mxin262/SwinTextSpotter <a name="NeRF"></a> # NeRF **Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields** - Homepage: https://jonbarron.info/mipnerf360/ - Paper: https://arxiv.org/abs/2111.12077 - Demo: https://youtu.be/YStDS2-Ln1s **Point-NeRF: Point-based Neural Radiance Fields** - Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/ - Paper: https://arxiv.org/abs/2201.08845 - Code: https://github.com/Xharlie/point-nerf **NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images** - Paper: https://arxiv.org/abs/2111.13679 - Homepage: https://bmild.github.io/rawnerf/ - Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc **Urban Radiance Fields** - Homepage: https://urban-radiance-fields.github.io/ - Paper: https://arxiv.org/abs/2111.14643 - Demo: https://youtu.be/qGlq5DZT6uc **Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation** - Paper: https://arxiv.org/abs/2202.13162 - Code: https://github.com/HexagonPrime/Pix2NeRF **HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video** - Homepage: https://grail.cs.washington.edu/projects/humannerf/ - Paper: https://arxiv.org/abs/2201.04127 - Demo: https://youtu.be/GM-RoZEymmw <a name="3D Face"></a> # 3D Face **ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations** - Paper: https://arxiv.org/abs/2203.14510 - Code: https://github.com/MingwuZheng/ImFace <a name="Long-Tail"></a> # 长尾分布(Long-Tail) **Retrieval Augmented Classification for Long-Tail Visual Recognition** - Paper: https://arxiv.org/abs/2202.11233 - Code: None <a name="Visual-Transformer"></a> # Visual Transformer ## Backbone **MPViT : Multi-Path Vision Transformer for Dense Prediction** - Paper: https://arxiv.org/abs/2112.11010 - Code: https://github.com/youngwanLEE/MPViT **MetaFormer is Actually What You Need for Vision** - Paper: https://arxiv.org/abs/2111.11418 - Code: https://github.com/sail-sg/poolformer **Mobile-Former: Bridging MobileNet and Transformer** - Paper: https://arxiv.org/abs/2108.05895 - Code: None - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ **Shunted Self-Attention via Multi-Scale Token Aggregation** - Paper(Oral): https://arxiv.org/abs/2111.15193 - Code: https://github.com/OliverRensu/Shunted-Transformer **Learned Queries for Efficient Local Attention** - Paper(Oral): https://arxiv.org/abs/2112.11435 - Code: https://github.com/moabarar/qna ## 应用(Application) **Language-based Video Editing via Multi-Modal Multi-Level Transformer** - Paper: https://arxiv.org/abs/2104.01122 - Code: None **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video** - Paper: https://arxiv.org/abs/2203.00859 - Code: None **Embracing Single Stride 3D Object Detector with Sparse Transformer** - Paper: https://arxiv.org/abs/2112.06375 - Code: https://github.com/TuSimple/SST - 中文解读:https://zhuanlan.zhihu.com/p/476056546 **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.02891 - Code: https://github.com/xulianuwa/MCTformer **Spatio-temporal Relation Modeling for Few-shot Action Recognition** - Paper: https://arxiv.org/abs/2112.05132 - Code: https://github.com/Anirudh257/strm **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction** - Paper: https://arxiv.org/abs/2111.07910 - Code: https://github.com/caiyuanhao1998/MST **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling** - Homepage: https://point-bert.ivg-research.xyz/ - Paper: https://arxiv.org/abs/2111.14819 - Code: https://github.com/lulutang0608/Point-BERT **GroupViT: Semantic Segmentation Emerges from Text Supervision** - Homepage: https://jerryxu.net/GroupViT/ - Paper: https://arxiv.org/abs/2202.11094 - Demo: https://youtu.be/DtJsWIUTW-Y **Restormer: Efficient Transformer for High-Resolution Image Restoration** - Paper: https://arxiv.org/abs/2111.09881 - Code: https://github.com/swz30/Restormer **Splicing ViT Features for Semantic Appearance Transfer** - Homepage: https://splice-vit.github.io/ - Paper: https://arxiv.org/abs/2201.00424 - Code: https://github.com/omerbt/Splice **Self-supervised Video Transformer** - Homepage: https://kahnchana.github.io/svt/ - Paper: https://arxiv.org/abs/2112.01514 - Code: https://github.com/kahnchana/svt **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers** - Paper: https://arxiv.org/abs/2203.02664 - Code: https://github.com/rulixiang/afa **Accelerating DETR Convergence via Semantic-Aligned Matching** - Paper: https://arxiv.org/abs/2203.06883 - Code: https://github.com/ZhangGongjie/SAM-DETR **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising** - Paper: https://arxiv.org/abs/2203.01305 - Code: https://github.com/FengLi-ust/DN-DETR - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w **Style Transformer for Image Inversion and Editing** - Paper: https://arxiv.org/abs/2203.07932 - Code: https://github.com/sapphire497/style-transformer **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer** - Paper: https://arxiv.org/abs/2203.10981 - Code: https://github.com/kuanchihhuang/MonoDTR **Mask Transfiner for High-Quality Instance Segmentation** - Paper: https://arxiv.org/abs/2111.13673 - Code: https://github.com/SysCV/transfiner **Language as Queries for Referring Video Object Segmentation** - Paper: https://arxiv.org/abs/2201.00487 - Code: https://github.com/wjn922/ReferFormer - 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning** - Paper: https://arxiv.org/abs/2203.00843 - Code: https://github.com/CurryYuan/X-Trans2Cap **AdaMixer: A Fast-Converging Query-Based Object Detector** - Paper(Oral): https://arxiv.org/abs/2203.16507 - Code: https://github.com/MCG-NJU/AdaMixer **Omni-DETR: Omni-Supervised Object Detection with Transformers** - Paper: https://arxiv.org/abs/2203.16089 - Code: https://github.com/amazon-research/omni-detr **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition** - Paper: https://arxiv.org/abs/2203.10209 - Code: https://github.com/mxin262/SwinTextSpotter **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** - Paper(Oral): https://arxiv.org/abs/2204.01018 - Code: https://github.com/SvipRepetitionCounting/TransRAC **Collaborative Transformers for Grounded Situation Recognition** - Paper: https://arxiv.org/abs/2203.16518 - Code: https://github.com/jhcho99/CoFormer **NFormer: Robust Person Re-identification with Neighbor Transformer** - Paper: https://arxiv.org/abs/2204.09331 - Code: https://github.com/haochenheheda/NFormer **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation** - Paper: https://arxiv.org/abs/2201.06889 - Code: None **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer** - Paper(Oral): https://arxiv.org/abs/2204.08680 - Code: https://github.com/zengwang430521/TCFormer **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** - Paper: https://arxiv.org/abs/2204.10039 - Code: https://github.com/H-deep/Trans-SVSR/ - Dataset: http://shorturl.at/mpwGX **Safe Self-Refinement for Transformer-based Domain Adaptation** - Paper: https://arxiv.org/abs/2204.07683 - Code: https://github.com/tsun/SSRT **Fast Point Transformer** - Homepage: http://cvlab.postech.ac.kr/research/FPT/ - Paper: https://arxiv.org/abs/2112.04702 - Code: https://github.com/POSTECH-CVLab/FastPointTransformer **Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval** - Paper: https://arxiv.org/abs/2204.09730 - Code: https://github.com/mshukor/TFood **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation** - Paper: https://arxiv.org/abs/2111.14887 - Code: https://github.com/lhoyer/DAFormer **Stratified Transformer for 3D Point Cloud Segmentation** - Paper: https://arxiv.org/pdf/2203.14508.pdf - Code: https://github.com/dvlab-research/Stratified-Transformer <a name="VL"></a> # 视觉和语言(Vision-Language) **Conditional Prompt Learning for Vision-Language Models** - Paper: https://arxiv.org/abs/2203.05557 - Code: https://github.com/KaiyangZhou/CoOp **Bridging Video-text Retrieval with Multiple Choice Question** - Paper: https://arxiv.org/abs/2201.04850 - Code: https://github.com/TencentARC/MCQ **Visual Abductive Reasoning** - Paper: https://arxiv.org/abs/2203.14040 - Code: https://github.com/leonnnop/VAR <a name="SSL"></a> # 自监督学习(Self-supervised Learning) **UniVIP: A Unified Framework for Self-Supervised Visual Pre-training** - Paper: https://arxiv.org/abs/2203.06965 - Code: None **Crafting Better Contrastive Views for Siamese Representation Learning** - Paper: https://arxiv.org/abs/2202.03278 - Code: https://github.com/xyupeng/ContrastiveCrop - 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A **HCSC: Hierarchical Contrastive Selective Coding** - Homepage: https://github.com/gyfastas/HCSC - Paper: https://arxiv.org/abs/2202.00455 - 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis** - Paper: https://arxiv.org/abs/2204.10437 - Code: https://github.com/JLiangLab/DiRA <a name="DA"></a> # 数据增强(Data Augmentation) **TeachAugment: Data Augmentation Optimization Using Teacher Knowledge** - Paper: https://arxiv.org/abs/2202.12513 - Code: https://github.com/DensoITLab/TeachAugment **AlignMixup: Improving Representations By Interpolating Aligned Features** - Paper: https://arxiv.org/abs/2103.15375 - Code: https://github.com/shashankvkt/AlignMixup_CVPR22 <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) **Decoupled Knowledge Distillation** - Paper: https://arxiv.org/abs/2203.08679 - Code: https://github.com/megvii-research/mdistiller - 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw <a name="Object-Detection"></a> # 目标检测(Object Detection) **BoxeR: Box-Attention for 2D and 3D Transformers** - Paper: https://arxiv.org/abs/2111.13087 - Code: https://github.com/kienduynguyen/BoxeR - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising** - Paper: https://arxiv.org/abs/2203.01305 - Code: https://github.com/FengLi-ust/DN-DETR - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w **Accelerating DETR Convergence via Semantic-Aligned Matching** - Paper: https://arxiv.org/abs/2203.06883 - Code: https://github.com/ZhangGongjie/SAM-DETR **Localization Distillation for Dense Object Detection** - Paper: https://arxiv.org/abs/2102.12252 - Code: https://github.com/HikariTJU/LD - Code2: https://github.com/HikariTJU/LD - 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg **Focal and Global Knowledge Distillation for Detectors** - Paper: https://arxiv.org/abs/2111.11837 - Code: https://github.com/yzd-v/FGD - 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ **A Dual Weighting Label Assignment Scheme for Object Detection** - Paper: https://arxiv.org/abs/2203.09730 - Code: https://github.com/strongwolf/DW **AdaMixer: A Fast-Converging Query-Based Object Detector** - Paper(Oral): https://arxiv.org/abs/2203.16507 - Code: https://github.com/MCG-NJU/AdaMixer **Omni-DETR: Omni-Supervised Object Detection with Transformers** - Paper: https://arxiv.org/abs/2203.16089 - Code: https://github.com/amazon-research/omni-detr **SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection** - Paper(Oral): https://arxiv.org/abs/2203.06398 - Code: https://github.com/CityU-AIM-Group/SIGMA ## 半监督目标检测 **Dense Learning based Semi-Supervised Object Detection** - Paper: https://arxiv.org/abs/2204.07300 - Code: https://github.com/chenbinghui1/DSL # 目标跟踪(Visual Tracking) **Correlation-Aware Deep Tracking** - Paper: https://arxiv.org/abs/2203.01666 - Code: None **TCTrack: Temporal Contexts for Aerial Tracking** - Paper: https://arxiv.org/abs/2203.01885 - Code: https://github.com/vision4robotics/TCTrack ## 多模态目标跟踪 **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline** - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/ - Paper: https://arxiv.org/abs/2204.04120 ## 多目标跟踪(Multi-Object Tracking) **Learning of Global Objective for Network Flow in Multi-Object Tracking** - Paper: https://arxiv.org/abs/2203.16210 - Code: None **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion** - Homepage: https://dancetrack.github.io - Paper: https://arxiv.org/abs/2111.14690 - Dataset: https://github.com/DanceTrack/DanceTrack <a name="Semantic-Segmentation"></a> # 语义分割(Semantic Segmentation) **Novel Class Discovery in Semantic Segmentation** - Homepage: https://ncdss.github.io/ - Paper: https://arxiv.org/abs/2112.01900 - Code: https://github.com/HeliosZhao/NCDSS **Deep Hierarchical Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.14335 - Code: https://github.com/0liliulei/HieraSeg **Rethinking Semantic Segmentation: A Prototype View** - Paper(Oral): https://arxiv.org/abs/2203.15102 - Code: https://github.com/tfzhou/ProtoSeg ## 弱监督语义分割 **Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.00962 - Code: https://github.com/zhaozhengChen/ReCAM **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.02891 - Code: https://github.com/xulianuwa/MCTformer **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers** - Paper: https://arxiv.org/abs/2203.02664 - Code: https://github.com/rulixiang/afa **CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.02668 - Code: https://github.com/CVI-SZU/CLIMS **CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.13505 - Code: https://github.com/CVI-SZU/CCAM **FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation** - Homeapage: http://cvlab.postech.ac.kr/research/FIFO/ - Paper(Oral): https://arxiv.org/abs/2204.01587 - Code: https://github.com/sohyun-l/FIFO **Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.09653 - Code: https://github.com/maeve07/RCA.git ## 半监督语义分割 **ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2106.05095 - Code: https://github.com/LiheYoung/ST-PlusPlus - 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA **Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels** - Homepage: https://haochen-wang409.github.io/U2PL/ - Paper: https://arxiv.org/abs/2203.03884 - Code: https://github.com/Haochen-Wang409/U2PL - 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ **Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation** - Paper: https://arxiv.org/pdf/2111.12903.pdf - Code: https://github.com/yyliu01/PS-MT ## 域自适应语义分割 **Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation** - Paper: https://arxiv.org/abs/2111.12940 - Code: https://github.com/BIT-DA/RIPU **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation** - Paper: https://arxiv.org/abs/2111.14887 - Code: https://github.com/lhoyer/DAFormer ## 无监督语义分割 **GroupViT: Semantic Segmentation Emerges from Text Supervision** - Homepage: https://jerryxu.net/GroupViT/ - Paper: https://arxiv.org/abs/2202.11094 - Demo: https://youtu.be/DtJsWIUTW-Y ## 少样本语义分割 **Generalized Few-shot Semantic Segmentation** - Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf - Code: https://github.com/dvlab-research/GFS-Seg <a name="Instance-Segmentation"></a> # 实例分割(Instance Segmentation) **BoxeR: Box-Attention for 2D and 3D Transformers** - Paper: https://arxiv.org/abs/2111.13087 - Code: https://github.com/kienduynguyen/BoxeR - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w **E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation** - Paper: https://arxiv.org/abs/2203.04074 - Code: https://github.com/zhang-tao-whu/e2ec **Mask Transfiner for High-Quality Instance Segmentation** - Paper: https://arxiv.org/abs/2111.13673 - Code: https://github.com/SysCV/transfiner **Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity** - Homepage: https://sites.google.com/view/generic-grouping/ - Paper: https://arxiv.org/abs/2204.06107 - Code: https://github.com/facebookresearch/Generic-Grouping ## 自监督实例分割 **FreeSOLO: Learning to Segment Objects without Annotations** - Paper: https://arxiv.org/abs/2202.12181 - Code: https://github.com/NVlabs/FreeSOLO ## 视频实例分割 **Efficient Video Instance Segmentation via Tracklet Query and Proposal** - Homepage: https://jialianwu.com/projects/EfficientVIS.html - Paper: https://arxiv.org/abs/2203.01853 - Demo: https://youtu.be/sSPMzgtMKCE **Temporally Efficient Vision Transformer for Video Instance Segmentation** - Paper: https://arxiv.org/abs/2204.08412 - Code: https://github.com/hustvl/TeViT <a name="Panoptic-Segmentation"></a> # 全景分割(Panoptic Segmentation) **Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers** - Paper: https://arxiv.org/abs/2109.03814 - Code: https://github.com/zhiqi-li/Panoptic-SegFormer **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark** - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset <a name="FFC"></a> # 小样本分类(Few-Shot Classification) **Integrative Few-Shot Learning for Classification and Segmentation** - Paper: https://arxiv.org/abs/2203.15712 - Code: https://github.com/dahyun-kang/ifsl **Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification** - Paper: https://arxiv.org/abs/2106.05517 - Code: https://github.com/LouieYang/MCL <a name="FFS"></a> # 小样本分割(Few-Shot Segmentation) **Learning What Not to Segment: A New Perspective on Few-Shot Segmentation** - Paper: https://arxiv.org/abs/2203.07615 - Code: https://github.com/chunbolang/BAM **Integrative Few-Shot Learning for Classification and Segmentation** - Paper: https://arxiv.org/abs/2203.15712 - Code: https://github.com/dahyun-kang/ifsl **Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation** - Paper: https://arxiv.org/abs/2204.10638 - Code: None <a name="Matting"></a> # 图像抠图(Image Matting) **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation** - Paper: https://arxiv.org/abs/2201.06889 - Code: None <a name="VU"></a> # 视频理解(Video Understanding) **Self-supervised Video Transformer** - Homepage: https://kahnchana.github.io/svt/ - Paper: https://arxiv.org/abs/2112.01514 - Code: https://github.com/kahnchana/svt **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** - Paper(Oral): https://arxiv.org/abs/2204.01018 - Code: https://github.com/SvipRepetitionCounting/TransRAC **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment** - Paper(Oral): https://arxiv.org/abs/2204.03646 - Dataset: https://github.com/xujinglin/FineDiving - Code: https://github.com/xujinglin/FineDiving - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg **Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition** - Paper(Oral): https://arxiv.org/abs/2204.02148 - Code: None ## 行为识别(Action Recognition) **Spatio-temporal Relation Modeling for Few-shot Action Recognition** - Paper: https://arxiv.org/abs/2112.05132 - Code: https://github.com/Anirudh257/strm ## 动作检测(Action Detection) **End-to-End Semi-Supervised Learning for Video Action Detection** - Paper: https://arxiv.org/abs/2203.04251 - Code: None <a name="Image-Editing"></a> # 图像编辑(Image Editing) **Style Transformer for Image Inversion and Editing** - Paper: https://arxiv.org/abs/2203.07932 - Code: https://github.com/sapphire497/style-transformer **Blended Diffusion for Text-driven Editing of Natural Images** - Paper: https://arxiv.org/abs/2111.14818 - Code: https://github.com/omriav/blended-diffusion **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing** - Homepage: https://semanticstylegan.github.io/ - Paper: https://arxiv.org/abs/2112.02236 - Demo: https://semanticstylegan.github.io/videos/demo.mp4 <a name="LLV"></a> # Low-level Vision **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior** - Paper: https://arxiv.org/abs/2111.15362 - Code: None **Restormer: Efficient Transformer for High-Resolution Image Restoration** - Paper: https://arxiv.org/abs/2111.09881 - Code: http
gitextract_f2cckni0/ ├── CVPR2019-Papers-with-Code.md ├── CVPR2020-Papers-with-Code.md ├── CVPR2021-Papers-with-Code.md ├── CVPR2022-Papers-with-Code.md ├── CVPR2023-Papers-with-Code.md ├── CVPR2024-Papers-with-Code.md ├── CVPR2025-Papers-with-Code.md ├── README.md └── master
Condensed preview — 9 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (336K chars).
[
{
"path": "CVPR2019-Papers-with-Code.md",
"chars": 3611,
"preview": "# CVPR2019-Code\n\nCVPR 2019 论文开源项目合集\n\n传送门:[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)\n\n附:[530 篇 CVPR 201"
},
{
"path": "CVPR2020-Papers-with-Code.md",
"chars": 60213,
"preview": "# CVPR2020-Code\n\n[CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目\n\n**【推荐阅读】*"
},
{
"path": "CVPR2021-Papers-with-Code.md",
"chars": 106762,
"preview": "# CVPR 2021 论文和开源项目合集(Papers with Code)\n\n[CVPR 2021](http://cvpr2021.thecvf.com/) 论文和开源项目合集(papers with code)!\n\nCVPR 202"
},
{
"path": "CVPR2022-Papers-with-Code.md",
"chars": 53048,
"preview": "# CVPR 2022 论文和开源项目合集(Papers with Code)\n\n[CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)!\n\nCVPR 20"
},
{
"path": "CVPR2023-Papers-with-Code.md",
"chars": 43727,
"preview": "# CVPR 2023 论文和开源项目合集(Papers with Code)\n\n[CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers wi"
},
{
"path": "CVPR2024-Papers-with-Code.md",
"chars": 24939,
"preview": "# CVPR 2024 论文和开源项目合集(Papers with Code)\n\nCVPR 2024 decisions are now available on OpenReview!\n\n\n> 注1:欢迎各位大佬提交issue,分享CVP"
},
{
"path": "CVPR2025-Papers-with-Code.md",
"chars": 19268,
"preview": "# CVPR 2025 论文和开源项目合集(Papers with Code)\n\nCVPR 2025 decisions are now available on OpenReview!22.1% = 2878 / 13008\n\n\n> 注1"
},
{
"path": "README.md",
"chars": 12055,
"preview": "# CVPR 2026 论文和开源项目合集(Papers with Code)\n\nCVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092\n\n\n> 注"
},
{
"path": "master",
"chars": 0,
"preview": ""
}
]
About this extraction
This page contains the full source code of the amusi/CVPR2026-Papers-with-Code GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 9 files (316.0 KB), approximately 101.3k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.