Repository: amusi/CVPR2026-Papers-with-Code Branch: main Commit: 5709455e269a Files: 9 Total size: 316.0 KB Directory structure: gitextract_f2cckni0/ ├── CVPR2019-Papers-with-Code.md ├── CVPR2020-Papers-with-Code.md ├── CVPR2021-Papers-with-Code.md ├── CVPR2022-Papers-with-Code.md ├── CVPR2023-Papers-with-Code.md ├── CVPR2024-Papers-with-Code.md ├── CVPR2025-Papers-with-Code.md ├── README.md └── master ================================================ FILE CONTENTS ================================================ ================================================ FILE: CVPR2019-Papers-with-Code.md ================================================ # CVPR2019-Code CVPR 2019 论文开源项目合集 传送门:[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code) 附:[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv) - [目标检测](#Object-Detection) - [目标跟踪](#Object-Tracking) - [语义分割](#Semantic-Segmentation) - [实例分割](#Instance-Segmentation) - [GAN](#GAN) - [人脸检测](#Face-Detection) - [人体姿态估计](#Human-Pose-Estimation) - [6DoF 姿态估计](#6DoF-Pose-Estimation) - [头部姿态估计](#Head-Pose-Estimation) - [人群密度估计](#Crowd-Counting) **更新记录:** - 20200226:添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code) - 20191026:添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv) - 20190405:添加 8 篇论文(目标检测、语义分割等方向) - 20190408:添加 6 篇论文(目标跟踪、GAN、6DoF姿态估计等方向) # 目标检测 **Bounding Box Regression with Uncertainty for Accurate Object Detection** - arXiv: - github: # 目标跟踪 **Fast Online Object Tracking and Segmentation: A Unifying Approach** - arXiv: - github: - homepage: **Unsupervised Deep Tracking** - arXiv: - github: - github(PyTorch): **Target-Aware Deep Tracking** - arXiv: - homepage: # 语义分割 **Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation** - arXiv: - github:[https://github.com/LinZhuoChen/DUpsampling(非官方)](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89) **Dual Attention Network for Scene Segmentation** - arXiv: - github: **Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images** - arXiv:None - github: # 实例分割 **Mask Scoring R-CNN** - arXiv: - github: # GAN **Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis** - arXiv: - github: # 人脸检测 **DSFD: Dual Shot Face Detector** - arXiv: - github: # 人体姿态估计 **Deep High-Resolution Representation Learning for Human Pose Estimation** - arXiv: - github: # 6DoF姿态估计 **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation** - arXiv: - github: # 头部姿态估计 **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation** - paper: - github: # 人群密度估计 **Learning from Synthetic Data for Crowd Counting in the Wild** - arXiv: - github: - homepage: ================================================ FILE: CVPR2020-Papers-with-Code.md ================================================ # CVPR2020-Code [CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目 **【推荐阅读】** - [CVPR 2020 virtual](http://cvpr20.com/) - ECCV 2020 论文开源项目合集来了:https://github.com/amusi/ECCV2020-Code - 关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision **【CVPR 2020 论文开源目录】** - [CNN](#CNN) - [图像分类](#Image-Classification) - [视频分类](#Video-Classification) - [目标检测](#Object-Detection) - [3D目标检测](#3D-Object-Detection) - [视频目标检测](#Video-Object-Detection) - [目标跟踪](#Object-Tracking) - [语义分割](#Semantic-Segmentation) - [实例分割](#Instance-Segmentation) - [全景分割](#Panoptic-Segmentation) - [视频目标分割](#VOS) - [超像素分割](#Superpixel) - [交互式图像分割](#IIS) - [NAS](#NAS) - [GAN](#GAN) - [Re-ID](#Re-ID) - [3D点云(分类/分割/配准/跟踪等)](#3D-PointCloud) - [人脸(识别/检测/重建等)](#Face) - [人体姿态估计(2D/3D)](#Human-Pose-Estimation) - [人体解析](#Human-Parsing) - [场景文本检测](#Scene-Text-Detection) - [场景文本识别](#Scene-Text-Recognition) - [特征(点)检测和描述](#Feature) - [超分辨率](#Super-Resolution) - [模型压缩/剪枝](#Model-Compression) - [视频理解/行为识别](#Action-Recognition) - [人群计数](#Crowd-Counting) - [深度估计](#Depth-Estimation) - [6D目标姿态估计](#6DOF) - [手势估计](#Hand-Pose) - [显著性检测](#Saliency) - [去噪](#Denoising) - [去雨](#Deraining) - [去模糊](#Deblurring) - [去雾](#Dehazing) - [特征点检测与描述](#Feature) - [视觉问答(VQA)](#VQA) - [视频问答(VideoQA)](#VideoQA) - [视觉语言导航](#VLN) - [视频压缩](#Video-Compression) - [视频插帧](#Video-Frame-Interpolation) - [风格迁移](#Style-Transfer) - [车道线检测](#Lane-Detection) - ["人-物"交互(HOI)检测](#HOI) - [轨迹预测](#TP) - [运动预测](#Motion-Predication) - [光流估计](#OF) - [图像检索](#IR) - [虚拟试衣](#Virtual-Try-On) - [HDR](#HDR) - [对抗样本](#AE) - [三维重建](#3D-Reconstructing) - [深度补全](#DC) - [语义场景补全](#SSC) - [图像/视频描述](#Captioning) - [线框解析](#WP) - [数据集](#Datasets) - [其他](#Others) - [不确定中没中](#Not-Sure) # CNN **Exploring Self-attention for Image Recognition** - 论文:https://hszhao.github.io/papers/cvpr20_san.pdf - 代码:https://github.com/hszhao/SAN **Improving Convolutional Networks with Self-Calibrated Convolutions** - 主页:https://mmcheng.net/scconv/ - 论文:http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf - 代码:https://github.com/backseason/SCNet **Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets** - 论文:https://arxiv.org/abs/2003.13549 - 代码:https://github.com/zeiss-microscopy/BSConv # 图像分类 **Interpretable and Accurate Fine-grained Recognition via Region Grouping** - 论文:https://arxiv.org/abs/2005.10411 - 代码:https://github.com/zxhuang1698/interpretability-by-parts **Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion** - 论文:https://arxiv.org/abs/2003.04490 - 代码:https://github.com/AdamKortylewski/CompositionalNets **Spatially Attentive Output Layer for Image Classification** - 论文:https://arxiv.org/abs/2004.07570 - 代码(好像被原作者删除了):https://github.com/ildoonet/spatially-attentive-output-layer # 视频分类 **SmallBigNet: Integrating Core and Contextual Views for Video Classification** - 论文:https://arxiv.org/abs/2006.14582 - 代码:https://github.com/xhl-video/SmallBigNet # 目标检测 **Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf - 代码:https://github.com/FishYuLi/BalancedGroupSoftmax **AugFPN: Improving Multi-scale Feature Learning for Object Detection** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf - 代码:https://github.com/Gus-Guo/AugFPN **Noise-Aware Fully Webly Supervised Object Detection** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html - 代码:https://github.com/shenyunhang/NA-fWebSOD/ **Learning a Unified Sample Weighting Network for Object Detection** - 论文:https://arxiv.org/abs/2006.06568 - 代码:https://github.com/caiqi/sample-weighting-network **D2Det: Towards High Quality Object Detection and Instance Segmentation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf - 代码:https://github.com/JialeCao001/D2Det **Dynamic Refinement Network for Oriented and Densely Packed Object Detection** - 论文下载链接:https://arxiv.org/abs/2005.09973 - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020 **Scale-Equalizing Pyramid Convolution for Object Detection** 论文:https://arxiv.org/abs/2005.03101 代码:https://github.com/jshilong/SEPC **Revisiting the Sibling Head in Object Detector** - 论文:https://arxiv.org/abs/2003.07540 - 代码:https://github.com/Sense-X/TSD **Scale-equalizing Pyramid Convolution for Object Detection** - 论文:暂无 - 代码:https://github.com/jshilong/SEPC **Detection in Crowded Scenes: One Proposal, Multiple Predictions** - 论文:https://arxiv.org/abs/2003.09163 - 代码:https://github.com/megvii-model/CrowdDetection **Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection** - 论文:https://arxiv.org/abs/2004.04725 - 代码:https://github.com/NVlabs/wetectron **Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection** - 论文:https://arxiv.org/abs/1912.02424 - 代码:https://github.com/sfzhang15/ATSS **BiDet: An Efficient Binarized Object Detector** - 论文:https://arxiv.org/abs/2003.03961 - 代码:https://github.com/ZiweiWangTHU/BiDet **Harmonizing Transferability and Discriminability for Adapting Object Detectors** - 论文:https://arxiv.org/abs/2003.06297 - 代码:https://github.com/chaoqichen/HTCN **CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection** - 论文:https://arxiv.org/abs/2003.09119 - 代码:https://github.com/KiveeDong/CentripetalNet **Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection** - 论文:https://arxiv.org/abs/2003.11818 - 代码:https://github.com/ggjy/HitDet.pytorch **EfficientDet: Scalable and Efficient Object Detection** - 论文:https://arxiv.org/abs/1911.09070 - 代码:https://github.com/google/automl/tree/master/efficientdet # 3D目标检测 **SESS: Self-Ensembling Semi-Supervised 3D Object Detection** - 论文: https://arxiv.org/abs/1912.11803 - 代码:https://github.com/Na-Z/sess **Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection** - 论文: https://arxiv.org/abs/2006.04356 - 代码:https://github.com/dleam/Associate-3Ddet **What You See is What You Get: Exploiting Visibility for 3D Object Detection** - 主页:https://www.cs.cmu.edu/~peiyunh/wysiwyg/ - 论文:https://arxiv.org/abs/1912.04986 - 代码:https://github.com/peiyunh/wysiwyg **Learning Depth-Guided Convolutions for Monocular 3D Object Detection** - 论文:https://arxiv.org/abs/1912.04799 - 代码:https://github.com/dingmyu/D4LCN **Structure Aware Single-stage 3D Object Detection from Point Cloud** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html - 代码:https://github.com/skyhehe123/SA-SSD **IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf - 代码:https://github.com/swords123/IDA-3D **Train in Germany, Test in The USA: Making 3D Object Detectors Generalize** - 论文:https://arxiv.org/abs/2005.08139 - 代码:https://github.com/cxy1997/3D_adapt_auto_driving **MLCVNet: Multi-Level Context VoteNet for 3D Object Detection** - 论文:https://arxiv.org/abs/2004.05679 - 代码:https://github.com/NUAAXQ/MLCVNet **3DSSD: Point-based 3D Single Stage Object Detector** - CVPR 2020 Oral - 论文:https://arxiv.org/abs/2002.10187 - 代码:https://github.com/tomztyang/3DSSD **Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation** - 论文:https://arxiv.org/abs/2004.03572 - 代码:https://github.com/zju3dv/disprcn **End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection** - 论文:https://arxiv.org/abs/2004.03080 - 代码:https://github.com/mileyan/pseudo-LiDAR_e2e **DSGN: Deep Stereo Geometry Network for 3D Object Detection** - 论文:https://arxiv.org/abs/2001.03398 - 代码:https://github.com/chenyilun95/DSGN **LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention** - 论文:https://arxiv.org/abs/2004.01389 - 代码:https://github.com/yinjunbo/3DVID **PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection** - 论文:https://arxiv.org/abs/1912.13192 - 代码:https://github.com/sshaoshuai/PV-RCNN **Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud** - 论文:https://arxiv.org/abs/2003.01251 - 代码:https://github.com/WeijingShi/Point-GNN # 视频目标检测 **Memory Enhanced Global-Local Aggregation for Video Object Detection** 论文:https://arxiv.org/abs/2003.12063 代码:https://github.com/Scalsol/mega.pytorch # 目标跟踪 **SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking** - 论文:https://arxiv.org/abs/1911.07241 - 代码:https://github.com/ohhhyeahhh/SiamCAR **D3S -- A Discriminative Single Shot Segmentation Tracker** - 论文:https://arxiv.org/abs/1911.08862 - 代码:https://github.com/alanlukezic/d3s **ROAM: Recurrently Optimizing Tracking Model** - 论文:https://arxiv.org/abs/1907.12006 - 代码:https://github.com/skyoung/ROAM **Siam R-CNN: Visual Tracking by Re-Detection** - 主页:https://www.vision.rwth-aachen.de/page/siamrcnn - 论文:https://arxiv.org/abs/1911.12836 - 论文2:https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf - 代码:https://github.com/VisualComputingInstitute/SiamR-CNN **Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises** - 论文:https://arxiv.org/abs/2003.09595 - 代码:https://github.com/MasterBin-IIAU/CSA **High-Performance Long-Term Tracking with Meta-Updater** - 论文:https://arxiv.org/abs/2004.00305 - 代码:https://github.com/Daikenan/LTMU **AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization** - 论文:https://arxiv.org/abs/2003.12949 - 代码:https://github.com/vision4robotics/AutoTrack **Probabilistic Regression for Visual Tracking** - 论文:https://arxiv.org/abs/2003.12565 - 代码:https://github.com/visionml/pytracking **MAST: A Memory-Augmented Self-supervised Tracker** - 论文:https://arxiv.org/abs/2002.07793 - 代码:https://github.com/zlai0/MAST **Siamese Box Adaptive Network for Visual Tracking** - 论文:https://arxiv.org/abs/2003.06761 - 代码:https://github.com/hqucv/siamban ## 多目标跟踪 **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset** - 主页:https://vap.aau.dk/3d-zef/ - 论文:https://arxiv.org/abs/2006.08466 - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/ - 数据集:https://motchallenge.net/data/3D-ZeF20 # 语义分割 **FDA: Fourier Domain Adaptation for Semantic Segmentation** - 论文:https://arxiv.org/abs/2004.05498 - 代码:https://github.com/YanchaoYang/FDA **Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation** - 论文:暂无 - 代码:https://github.com/JianqiangWan/Super-BPD **Single-Stage Semantic Segmentation from Image Labels** - 论文:https://arxiv.org/abs/2005.08104 - 代码:https://github.com/visinf/1-stage-wseg **Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation** - 论文:https://arxiv.org/abs/2003.00867 - 代码:https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation** - 论文:http://vladlen.info/papers/MSeg.pdf - 代码:https://github.com/mseg-dataset/mseg-api **CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement** - 论文:https://arxiv.org/abs/2005.02551 - 代码:https://github.com/hkchengrex/CascadePSP **Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision** - Oral - 论文:https://arxiv.org/abs/2004.07703 - 代码:https://github.com/feipan664/IntraDA **Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation** - 论文:https://arxiv.org/abs/2004.04581 - 代码:https://github.com/YudeWang/SEAM **Temporally Distributed Networks for Fast Video Segmentation** - 论文:https://arxiv.org/abs/2004.01800 - 代码:https://github.com/feinanshan/TDNet **Context Prior for Scene Segmentation** - 论文:https://arxiv.org/abs/2004.01547 - 代码:https://git.io/ContextPrior **Strip Pooling: Rethinking Spatial Pooling for Scene Parsing** - 论文:https://arxiv.org/abs/2003.13328 - 代码:https://github.com/Andrew-Qibin/SPNet **Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks** - 论文:https://arxiv.org/abs/2003.05128 - 代码:https://github.com/shachoi/HANet **Learning Dynamic Routing for Semantic Segmentation** - 论文:https://arxiv.org/abs/2003.10401 - 代码:https://github.com/yanwei-li/DynamicRouting # 实例分割 **D2Det: Towards High Quality Object Detection and Instance Segmentation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf - 代码:https://github.com/JialeCao001/D2Det **PolarMask: Single Shot Instance Segmentation with Polar Representation** - 论文:https://arxiv.org/abs/1909.13226 - 代码:https://github.com/xieenze/PolarMask - 解读:https://zhuanlan.zhihu.com/p/84890413 **CenterMask : Real-Time Anchor-Free Instance Segmentation** - 论文:https://arxiv.org/abs/1911.06667 - 代码:https://github.com/youngwanLEE/CenterMask **BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation** - 论文:https://arxiv.org/abs/2001.00309 - 代码:https://github.com/aim-uofa/AdelaiDet **Deep Snake for Real-Time Instance Segmentation** - 论文:https://arxiv.org/abs/2001.01629 - 代码:https://github.com/zju3dv/snake **Mask Encoding for Single Shot Instance Segmentation** - 论文:https://arxiv.org/abs/2003.11712 - 代码:https://github.com/aim-uofa/AdelaiDet # 全景分割 **Video Panoptic Segmentation** - 论文:https://arxiv.org/abs/2006.11339 - 代码:https://github.com/mcahny/vps - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0 **Pixel Consensus Voting for Panoptic Segmentation** - 论文:https://arxiv.org/abs/2004.01849 - 代码:还未公布 **BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation** 论文:https://arxiv.org/abs/2003.14031 代码:https://github.com/Mooonside/BANet # 视频目标分割 **A Transductive Approach for Video Object Segmentation** - 论文:https://arxiv.org/abs/2004.07193 - 代码:https://github.com/microsoft/transductive-vos.pytorch **State-Aware Tracker for Real-Time Video Object Segmentation** - 论文:https://arxiv.org/abs/2003.00482 - 代码:https://github.com/MegviiDetection/video_analyst **Learning Fast and Robust Target Models for Video Object Segmentation** - 论文:https://arxiv.org/abs/2003.00908 - 代码:https://github.com/andr345/frtm-vos **Learning Video Object Segmentation from Unlabeled Videos** - 论文:https://arxiv.org/abs/2003.05020 - 代码:https://github.com/carrierlxk/MuG # 超像素分割 **Superpixel Segmentation with Fully Convolutional Networks** - 论文:https://arxiv.org/abs/2003.12929 - 代码:https://github.com/fuy34/superpixel_fcn # 交互式图像分割 **Interactive Object Segmentation with Inside-Outside Guidance** - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet # NAS **AOWS: Adaptive and optimal network width search with latency constraints** - 论文:https://arxiv.org/abs/2005.10481 - 代码:https://github.com/bermanmaxim/AOWS **Densely Connected Search Space for More Flexible Neural Architecture Search** - 论文:https://arxiv.org/abs/1906.09607 - 代码:https://github.com/JaminFong/DenseNAS **MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning** - 论文:https://arxiv.org/abs/2003.14058 - 代码:https://github.com/bhpfelix/MTLNAS **FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions** - 论文下载链接:https://arxiv.org/abs/2004.05565 - 代码:https://github.com/facebookresearch/mobile-vision **Neural Architecture Search for Lightweight Non-Local Networks** - 论文:https://arxiv.org/abs/2004.01961 - 代码:https://github.com/LiYingwei/AutoNL **Rethinking Performance Estimation in Neural Architecture Search** - 论文:https://arxiv.org/abs/2005.09917 - 代码:https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS - 解读1:https://www.zhihu.com/question/372070853/answer/1035234510 - 解读2:https://zhuanlan.zhihu.com/p/111167409 **CARS: Continuous Evolution for Efficient Neural Architecture Search** - 论文:https://arxiv.org/abs/1909.04977 - 代码(即将开源):https://github.com/huawei-noah/CARS # GAN **SEAN: Image Synthesis with Semantic Region-Adaptive Normalization** - 论文:https://arxiv.org/abs/1911.12861 - 代码:https://github.com/ZPdesu/SEAN **Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation** - 论文地址:http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html - 代码地址:https://github.com/alpc91/NICE-GAN-pytorch **Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning** - 论文:https://arxiv.org/abs/1912.01899 - 代码:https://github.com/SsGood/DBGAN **PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer** - 论文:https://arxiv.org/abs/1909.06956 - 代码:https://github.com/wtjiang98/PSGAN **Semantically Mutil-modal Image Synthesis** - 主页:http://seanseattle.github.io/SMIS - 论文:https://arxiv.org/abs/2003.12697 - 代码:https://github.com/Seanseattle/SMIS **Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping** - 论文:https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf - 代码:https://github.com/yiranran/Unpaired-Portrait-Drawing **Learning to Cartoonize Using White-box Cartoon Representations** - 论文:https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf - 主页:https://systemerrorwang.github.io/White-box-Cartoonization/ - 代码:https://github.com/SystemErrorWang/White-box-Cartoonization - 解读:https://zhuanlan.zhihu.com/p/117422157 - Demo视频:https://www.bilibili.com/video/av56708333 **GAN Compression: Efficient Architectures for Interactive Conditional GANs** - 论文:https://arxiv.org/abs/2003.08936 - 代码:https://github.com/mit-han-lab/gan-compression **Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions** - 论文:https://arxiv.org/abs/2003.01826 - 代码:https://github.com/cc-hpc-itwm/UpConv # Re-ID **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html - 代码:https://github.com/wangguanan/HOReID **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification** - 论文:https://arxiv.org/abs/2005.07862 - 数据集:暂无 **Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking** - 论文:https://arxiv.org/abs/2004.04199 - 代码:https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking **Pose-guided Visible Part Matching for Occluded Person ReID** - 论文:https://arxiv.org/abs/2004.00230 - 代码:https://github.com/hh23333/PVPM **Weakly supervised discriminative feature learning with state information for person identification** - 论文:https://arxiv.org/abs/2002.11939 - 代码:https://github.com/KovenYu/state-information # 3D点云(分类/分割/配准等) ## 3D点云卷积 **PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling** - 论文:https://arxiv.org/abs/2003.00492 - 代码:https://github.com/yanx27/PointASNL **Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds** - 论文下载链接:https://arxiv.org/abs/2003.12971 - 代码:https://github.com/raoyongming/PointGLR **Grid-GCN for Fast and Scalable Point Cloud Learning** - 论文:https://arxiv.org/abs/1912.02984 - 代码:https://github.com/Xharlie/Grid-GCN **FPConv: Learning Local Flattening for Point Convolution** - 论文:https://arxiv.org/abs/2002.10701 - 代码:https://github.com/lyqun/FPConv ## 3D点云分类 **PointAugment: an Auto-Augmentation Framework for Point Cloud Classification** - 论文:https://arxiv.org/abs/2002.10876 - 代码(即将开源): https://github.com/liruihui/PointAugment/ ## 3D点云语义分割 **RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds** - 论文:https://arxiv.org/abs/1911.11236 - 代码:https://github.com/QingyongHu/RandLA-Net - 解读:https://zhuanlan.zhihu.com/p/105433460 **Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels** - 论文:https://arxiv.org/abs/2004.04091 - 代码:https://github.com/alex-xun-xu/WeakSupPointCloudSeg **PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation** - 论文:https://arxiv.org/abs/2003.14032 - 代码:https://github.com/edwardzhou130/PolarSeg **Learning to Segment 3D Point Clouds in 2D Image Space** - 论文:https://arxiv.org/abs/2003.05593 - 代码:https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space ## 3D点云实例分割 PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation - 论文:https://arxiv.org/abs/2004.01658 - 代码:https://github.com/Jia-Research-Lab/PointGroup ## 3D点云配准 **Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences** - 论文:https://arxiv.org/abs/2005.01014 - 代码:https://github.com/XiaoshuiHuang/fmr **D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features** - 论文:https://arxiv.org/abs/2003.03164 - 代码:https://github.com/XuyangBai/D3Feat **RPM-Net: Robust Point Matching using Learned Features** - 论文:https://arxiv.org/abs/2003.13479 - 代码:https://github.com/yewzijian/RPMNet ## 3D点云补全 **Cascaded Refinement Network for Point Cloud Completion** - 论文:https://arxiv.org/abs/2004.03327 - 代码:https://github.com/xiaogangw/cascaded-point-completion ## 3D点云目标跟踪 **P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds** - 论文:https://arxiv.org/abs/2005.13888 - 代码:https://github.com/HaozheQi/P2B ## 其他 **An Efficient PointLSTM for Point Clouds Based Gesture Recognition** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html - 代码:https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch # 人脸 ## 人脸识别 **CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition** - 论文:https://arxiv.org/abs/2004.00288 - 代码:https://github.com/HuangYG123/CurricularFace **Learning Meta Face Recognition in Unseen Domains** - 论文:https://arxiv.org/abs/2003.07733 - 代码:https://github.com/cleardusk/MFR - 解读:https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ ## 人脸检测 ## 人脸活体检测 **Searching Central Difference Convolutional Networks for Face Anti-Spoofing** - 论文:https://arxiv.org/abs/2003.04092 - 代码:https://github.com/ZitongYu/CDCN ## 人脸表情识别 **Suppressing Uncertainties for Large-Scale Facial Expression Recognition** - 论文:https://arxiv.org/abs/2002.10392 - 代码(即将开源):https://github.com/kaiwang960112/Self-Cure-Network ## 人脸转正 **Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images** - 论文:https://arxiv.org/abs/2003.08124 - 代码:https://github.com/Hangz-nju-cuhk/Rotate-and-Render ## 人脸3D重建 **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"** - 论文:https://arxiv.org/abs/2003.13845 - 数据集:https://github.com/lattas/AvatarMe **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction** - 论文:https://arxiv.org/abs/2003.13989 - 代码:https://github.com/zhuhao-nju/facescape # 人体姿态估计(2D/3D) ## 2D人体姿态估计 **TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting** - 主页:https://yzhq97.github.io/transmomo/ - 论文:https://arxiv.org/abs/2003.14401 - 代码:https://github.com/yzhq97/transmomo.pytorch **HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation** - 论文:https://arxiv.org/abs/1908.10357 - 代码:https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation **The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation** - 论文:https://arxiv.org/abs/1911.07524 - 代码:https://github.com/HuangJunJie2017/UDP-Pose - 解读:https://zhuanlan.zhihu.com/p/92525039 **Distribution-Aware Coordinate Representation for Human Pose Estimation** - 主页:https://ilovepose.github.io/coco/ - 论文:https://arxiv.org/abs/1910.06278 - 代码:https://github.com/ilovepose/DarkPose ## 3D人体姿态估计 **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data** - 论文:https://arxiv.org/abs/2006.07778 - 代码:https://github.com/Nicholasli1995/EvoSkeleton **Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach** - 主页:https://www.zhe-zhang.com/cvpr2020 - 论文:https://arxiv.org/abs/2003.11163 - 代码:https://github.com/CHUNYUWANG/imu-human-pose-pytorch **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data** - 论文下载链接:https://arxiv.org/abs/2004.01166 - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML **Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis** - 主页:http://val.cds.iisc.ac.in/pgp-human/ - 论文:https://arxiv.org/abs/2004.04400 **Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation** - 论文:https://arxiv.org/abs/2004.00329 - 代码:https://github.com/fabbrimatteo/LoCO **VIBE: Video Inference for Human Body Pose and Shape Estimation** - 论文:https://arxiv.org/abs/1912.05656 - 代码:https://github.com/mkocabas/VIBE **Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation** - 论文:https://arxiv.org/abs/2002.11251 - 代码:https://github.com/vnmr/JointVideoPose3D **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS** - 论文:https://arxiv.org/abs/2003.03972 - 数据集:暂无 # 人体解析 **Correlating Edge, Pose with Parsing** - 论文:https://arxiv.org/abs/2005.01431 - 代码:https://github.com/ziwei-zh/CorrPM # 场景文本检测 **STEFANN: Scene Text Editor using Font Adaptive Neural Network** - 主页:https://prasunroy.github.io/stefann/ - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html - 代码:https://github.com/prasunroy/stefann - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k **ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf - 代码:https://github.com/wangyuxin87/ContourNet **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** - 论文:https://arxiv.org/abs/2003.10608 - 代码和数据集:https://github.com/Jyouhou/UnrealText/ **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network** - 论文:https://arxiv.org/abs/2002.10200 - 代码(即将开源):https://github.com/Yuliang-Liu/bezier_curve_text_spotting - 代码(即将开源):https://github.com/aim-uofa/adet **Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection** - 论文:https://arxiv.org/abs/2003.07493 - 代码:https://github.com/GXYM/DRRG # 场景文本识别 **SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition** - 论文:https://arxiv.org/abs/2005.10977 - 代码:https://github.com/Pay20Y/SEED **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** - 论文:https://arxiv.org/abs/2003.10608 - 代码和数据集:https://github.com/Jyouhou/UnrealText/ **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network** - 论文:https://arxiv.org/abs/2002.10200 - 代码(即将开源):https://github.com/aim-uofa/adet **Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition** - 论文:https://arxiv.org/abs/2003.06606 - 代码:https://github.com/Canjie-Luo/Text-Image-Augmentation # 特征(点)检测和描述 **SuperGlue: Learning Feature Matching with Graph Neural Networks** - 论文:https://arxiv.org/abs/1911.11763 - 代码:https://github.com/magicleap/SuperGluePretrainedNetwork # 超分辨率 ## 图像超分辨率 **Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html - 代码:https://github.com/guoyongcs/DRN **Learning Texture Transformer Network for Image Super-Resolution** - 论文:https://arxiv.org/abs/2006.04139 - 代码:https://github.com/FuzhiYang/TTSR **Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining** - 论文:https://arxiv.org/abs/2006.01424 - 代码:https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention **Structure-Preserving Super Resolution with Gradient Guidance** - 论文:https://arxiv.org/abs/2003.13081 - 代码:https://github.com/Maclory/SPSR **Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy** 论文:https://arxiv.org/abs/2004.00448 代码:https://github.com/clovaai/cutblur ## 视频超分辨率 **TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution** - 论文:https://arxiv.org/abs/1812.02898 - 代码:https://github.com/YapengTian/TDAN-VSR-CVPR-2020 **Space-Time-Aware Multi-Resolution Video Enhancement** - 主页:https://alterzero.github.io/projects/STAR.html - 论文:http://arxiv.org/abs/2003.13170 - 代码:https://github.com/alterzero/STARnet **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution** - 论文:https://arxiv.org/abs/2002.11616 - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 # 模型压缩/剪枝 **DMCP: Differentiable Markov Channel Pruning for Neural Networks** - 论文:https://arxiv.org/abs/2005.03354 - 代码:https://github.com/zx55/dmcp **Forward and Backward Information Retention for Accurate Binary Neural Networks** - 论文:https://arxiv.org/abs/1909.10788 - 代码:https://github.com/htqin/IR-Net **Towards Efficient Model Compression via Learned Global Ranking** - 论文:https://arxiv.org/abs/1904.12368 - 代码:https://github.com/cmu-enyac/LeGR **HRank: Filter Pruning using High-Rank Feature Map** - 论文:http://arxiv.org/abs/2002.10179 - 代码:https://github.com/lmbxmu/HRank **GAN Compression: Efficient Architectures for Interactive Conditional GANs** - 论文:https://arxiv.org/abs/2003.08936 - 代码:https://github.com/mit-han-lab/gan-compression **Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression** - 论文:https://arxiv.org/abs/2003.08935 - 代码:https://github.com/ofsoundof/group_sparsity # 视频理解/行为识别 **Oops! Predicting Unintentional Action in Video** - 主页:https://oops.cs.columbia.edu/ - 论文:https://arxiv.org/abs/1911.11206 - 代码:https://github.com/cvlab-columbia/oops - 数据集:https://oops.cs.columbia.edu/data **PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition** - 论文:https://arxiv.org/abs/1911.12409 - 代码:https://github.com/shlizee/Predict-Cluster **Intra- and Inter-Action Understanding via Temporal Action Parsing** - 论文:https://arxiv.org/abs/2005.10229 - 主页和数据集:https://sdolivia.github.io/TAPOS/ **3DV: 3D Dynamic Voxel for Action Recognition in Depth Video** - 论文:https://arxiv.org/abs/2005.05501 - 代码:https://github.com/3huo/3DV-Action **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding** - 主页:https://sdolivia.github.io/FineGym/ - 论文:https://arxiv.org/abs/2004.06704 **TEA: Temporal Excitation and Aggregation for Action Recognition** - 论文:https://arxiv.org/abs/2004.01398 - 代码:https://github.com/Phoenix1327/tea-action-recognition **X3D: Expanding Architectures for Efficient Video Recognition** - 论文:https://arxiv.org/abs/2004.04730 - 代码:https://github.com/facebookresearch/SlowFast **Temporal Pyramid Network for Action Recognition** - 主页:https://decisionforce.github.io/TPN - 论文:https://arxiv.org/abs/2004.03548 - 代码:https://github.com/decisionforce/TPN ## 基于骨架的动作识别 **Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition** - 论文:https://arxiv.org/abs/2003.14111 - 代码:https://github.com/kenziyuliu/ms-g3d # 人群计数 # 深度估计 **BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf - 代码:https://github.com/Yeh-yu-hsuan/BiFuse **Focus on defocus: bridging the synthetic to real domain gap for depth estimation** - 论文:https://arxiv.org/abs/2005.09623 - 代码:https://github.com/dvl-tum/defocus-net **Bi3D: Stereo Depth Estimation via Binary Classifications** - 论文:https://arxiv.org/abs/2005.07274 - 代码:https://github.com/NVlabs/Bi3D **AANet: Adaptive Aggregation Network for Efficient Stereo Matching** - 论文:https://arxiv.org/abs/2004.09548 - 代码:https://github.com/haofeixu/aanet **Towards Better Generalization: Joint Depth-Pose Learning without PoseNet** - 论文:https://github.com/B1ueber2y/TrianFlow - 代码:https://github.com/B1ueber2y/TrianFlow ## 单目深度估计 **On the uncertainty of self-supervised monocular depth estimation** - 论文:https://arxiv.org/abs/2005.06209 - 代码:https://github.com/mattpoggi/mono-uncertainty **3D Packing for Self-Supervised Monocular Depth Estimation** - 论文:https://arxiv.org/abs/1905.02693 - 代码:https://github.com/TRI-ML/packnet-sfm - Demo视频:https://www.bilibili.com/video/av70562892/ **Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation** - 论文:https://arxiv.org/abs/2002.12114 - 代码:https://github.com/yzhao520/ARC # 6D目标姿态估计 **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf - 代码:https://github.com/ethnhe/PVN3D **MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion** - 论文:https://arxiv.org/abs/2004.04336 - 代码:https://github.com/wkentaro/morefusion **EPOS: Estimating 6D Pose of Objects with Symmetries** 主页:http://cmp.felk.cvut.cz/epos 论文:https://arxiv.org/abs/2004.00605 **G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features** - 论文:https://arxiv.org/abs/2003.11089 - 代码:https://github.com/DC1991/G2L_Net # 手势估计 **HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation** - 论文:https://arxiv.org/abs/2004.00060 - 主页:http://vision.sice.indiana.edu/projects/hopenet **Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data** - 论文:https://arxiv.org/abs/2003.09572 - 代码:https://github.com/CalciferZh/minimal-hand # 显著性检测 **JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection** - 论文:https://arxiv.org/abs/2004.08515 - 代码:https://github.com/kerenfu/JLDCF/ **UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders** - 主页:http://dpfan.net/d3netbenchmark/ - 论文:https://arxiv.org/abs/2004.05763 - 代码:https://github.com/JingZhang617/UCNet # 去噪 **A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising** - 论文:https://arxiv.org/abs/2003.12751 - 代码:https://github.com/Vandermode/NoiseModel **CycleISP: Real Image Restoration via Improved Data Synthesis** - 论文:https://arxiv.org/abs/2003.07761 - 代码:https://github.com/swz30/CycleISP # 去雨 **Multi-Scale Progressive Fusion Network for Single Image Deraining** - 论文:https://arxiv.org/abs/2003.10985 - 代码:https://github.com/kuihua/MSPFN **Detail-recovery Image Deraining via Context Aggregation Networks** - 论文:https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html - 代码:https://github.com/Dengsgithub/DRD-Net # 去模糊 ## 视频去模糊 **Cascaded Deep Video Deblurring Using Temporal Sharpness Prior** - 主页:https://csbhr.github.io/projects/cdvd-tsp/index.html - 论文:https://arxiv.org/abs/2004.02501 - 代码:https://github.com/csbhr/CDVD-TSP # 去雾 **Domain Adaptation for Image Dehazing** - 论文:https://arxiv.org/abs/2005.04668 - 代码:https://github.com/HUSTSYJ/DA_dahazing **Multi-Scale Boosted Dehazing Network with Dense Feature Fusion** - 论文:https://arxiv.org/abs/2004.13388 - 代码:https://github.com/BookerDeWitt/MSBDN-DFF # 特征点检测与描述 **ASLFeat: Learning Local Features of Accurate Shape and Localization** - 论文:https://arxiv.org/abs/2003.10071 - 代码:https://github.com/lzx551402/aslfeat # 视觉问答(VQA) **VC R-CNN:Visual Commonsense R-CNN** - 论文:https://arxiv.org/abs/2002.12204 - 代码:https://github.com/Wangt-CN/VC-R-CNN # 视频问答(VideoQA) **Hierarchical Conditional Relation Networks for Video Question Answering** - 论文:https://arxiv.org/abs/2002.10698 - 代码:https://github.com/thaolmk54/hcrn-videoqa # 视觉语言导航 **Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training** - 论文:https://arxiv.org/abs/2002.10638 - 代码(即将开源):https://github.com/weituo12321/PREVALENT # 视频压缩 **Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement** - 论文:https://arxiv.org/abs/2003.01966 - 代码:https://github.com/RenYang-home/HLVC # 视频插帧 **AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation** - 论文:https://arxiv.org/abs/1907.10244 - 代码:https://github.com/HyeongminLEE/AdaCoF-pytorch **FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html - 代码:https://github.com/CM-BF/FeatureFlow **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution** - 论文:https://arxiv.org/abs/2002.11616 - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 **Space-Time-Aware Multi-Resolution Video Enhancement** - 主页:https://alterzero.github.io/projects/STAR.html - 论文:http://arxiv.org/abs/2003.13170 - 代码:https://github.com/alterzero/STARnet **Scene-Adaptive Video Frame Interpolation via Meta-Learning** - 论文:https://arxiv.org/abs/2004.00779 - 代码:https://github.com/myungsub/meta-interpolation **Softmax Splatting for Video Frame Interpolation** - 主页:http://sniklaus.com/papers/softsplat - 论文:https://arxiv.org/abs/2003.05534 - 代码:https://github.com/sniklaus/softmax-splatting # 风格迁移 **Diversified Arbitrary Style Transfer via Deep Feature Perturbation** - 论文:https://arxiv.org/abs/1909.08223 - 代码:https://github.com/EndyWon/Deep-Feature-Perturbation **Collaborative Distillation for Ultra-Resolution Universal Style Transfer** - 论文:https://arxiv.org/abs/2003.08436 - 代码:https://github.com/mingsun-tse/collaborative-distillation # 车道线检测 **Inter-Region Affinity Distillation for Road Marking Segmentation** - 论文:https://arxiv.org/abs/2004.05304 - 代码:https://github.com/cardwing/Codes-for-IntRA-KD # "人-物"交互(HOT)检测 **PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection** - 论文:https://arxiv.org/abs/1912.12898 - 代码:https://github.com/YueLiao/PPDM **Detailed 2D-3D Joint Representation for Human-Object Interaction** - 论文:https://arxiv.org/abs/2004.08154 - 代码:https://github.com/DirtyHarryLYL/DJ-RN **Cascaded Human-Object Interaction Recognition** - 论文:https://arxiv.org/abs/2003.04262 - 代码:https://github.com/tfzhou/C-HOI **VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions** - 论文:https://arxiv.org/abs/2003.05541 - 代码:https://github.com/ASMIftekhar/VSGNet # 轨迹预测 **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction** - 论文:https://arxiv.org/abs/1912.06445 - 代码:https://github.com/JunweiLiang/Multiverse - 数据集:https://next.cs.cmu.edu/multiverse/ **Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction** - 论文:https://arxiv.org/abs/2002.11927 - 代码:https://github.com/abduallahmohamed/Social-STGCNN # 运动预测 **Collaborative Motion Prediction via Neural Motion Message Passing** - 论文:https://arxiv.org/abs/2003.06594 - 代码:https://github.com/PhyllisH/NMMP **MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps** - 论文:https://arxiv.org/abs/2003.06754 - 代码:https://github.com/pxiangwu/MotionNet # 光流估计 **Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation** - 论文:https://arxiv.org/abs/2003.13045 - 代码:https://github.com/lliuz/ARFlow # 图像检索 **Evade Deep Image Retrieval by Stashing Private Images in the Hash Space** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html - 代码:https://github.com/sugarruy/hashstash # 虚拟试衣 **Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content** - 论文:https://arxiv.org/abs/2003.05863 - 代码:https://github.com/switchablenorms/DeepFashion_Try_On # HDR **Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline** - 主页:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR - 论文下载链接:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf - 代码:https://github.com/alex04072000/SingleHDR # 对抗样本 **Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction** - 论文:https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf - 代码:https://github.com/erbloo/dr_cvpr20 **Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance** - 论文:https://arxiv.org/abs/1911.02466 - 代码:https://github.com/ZhengyuZhao/PerC-Adversarial # 三维重建 **Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild** - **CVPR 2020 Best Paper** - 主页:https://elliottwu.com/projects/unsup3d/ - 论文:https://arxiv.org/abs/1911.11130 - 代码:https://github.com/elliottwu/unsup3d **Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization** - 主页:https://shunsukesaito.github.io/PIFuHD/ - 论文:https://arxiv.org/abs/2004.00452 - 代码:https://github.com/facebookresearch/pifuhd - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf - 代码:https://github.com/chaitanya100100/TailorNet - 数据集:https://github.com/zycliao/TailorNet_dataset **Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf - 代码:https://github.com/jchibane/if-net - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf - 代码:https://github.com/aymenmir1/pix2surf # 深度补全 **Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End** 论文:https://arxiv.org/abs/2006.03349 代码:https://github.com/abdo-eldesokey/pncnn # 语义场景补全 **3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior** - 论文:https://arxiv.org/abs/2003.14052 - 代码:https://github.com/charlesCXK/TorchSSC # 图像/视频描述 **Syntax-Aware Action Targeting for Video Captioning** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf - 代码:https://github.com/SydCaption/SAAT # 线框解析 **Holistically-Attracted Wireframe Parser** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html - 代码:https://github.com/cherubicXN/hawp # 数据集 **OASIS: A Large-Scale Dataset for Single Image 3D in the Wild** - 论文:https://arxiv.org/abs/2007.13215 - 数据集:https://oasis.cs.princeton.edu/ **STEFANN: Scene Text Editor using Font Adaptive Neural Network** - 主页:https://prasunroy.github.io/stefann/ - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html - 代码:https://github.com/prasunroy/stefann - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k **Interactive Object Segmentation with Inside-Outside Guidance** - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet **Video Panoptic Segmentation** - 论文:https://arxiv.org/abs/2006.11339 - 代码:https://github.com/mcahny/vps - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0 **FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html - 代码:https://github.com/HKUSTCV/FSS-1000 - 数据集:https://github.com/HKUSTCV/FSS-1000 **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset** - 主页:https://vap.aau.dk/3d-zef/ - 论文:https://arxiv.org/abs/2006.08466 - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/ - 数据集:https://motchallenge.net/data/3D-ZeF20 **TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf - 代码:https://github.com/chaitanya100100/TailorNet - 数据集:https://github.com/zycliao/TailorNet_dataset **Oops! Predicting Unintentional Action in Video** - 主页:https://oops.cs.columbia.edu/ - 论文:https://arxiv.org/abs/1911.11206 - 代码:https://github.com/cvlab-columbia/oops - 数据集:https://oops.cs.columbia.edu/data **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction** - 论文:https://arxiv.org/abs/1912.06445 - 代码:https://github.com/JunweiLiang/Multiverse - 数据集:https://next.cs.cmu.edu/multiverse/ **Open Compound Domain Adaptation** - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing - 论文:https://arxiv.org/abs/1909.03403 - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA **Intra- and Inter-Action Understanding via Temporal Action Parsing** - 论文:https://arxiv.org/abs/2005.10229 - 主页和数据集:https://sdolivia.github.io/TAPOS/ **Dynamic Refinement Network for Oriented and Densely Packed Object Detection** - 论文下载链接:https://arxiv.org/abs/2005.09973 - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020 **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification** - 论文:https://arxiv.org/abs/2005.07862 - 数据集:暂无 **KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations** - 论文:https://arxiv.org/abs/2002.12687 - 数据集:https://github.com/qq456cvb/KeypointNet **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation** - 论文:http://vladlen.info/papers/MSeg.pdf - 代码:https://github.com/mseg-dataset/mseg-api - 数据集:https://github.com/mseg-dataset/mseg-semantic **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"** - 论文:https://arxiv.org/abs/2003.13845 - 数据集:https://github.com/lattas/AvatarMe **Learning to Autofocus** - 论文:https://arxiv.org/abs/2004.12260 - 数据集:暂无 **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction** - 论文:https://arxiv.org/abs/2003.13989 - 代码:https://github.com/zhuhao-nju/facescape **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data** - 论文下载链接:https://arxiv.org/abs/2004.01166 - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding** - 主页:https://sdolivia.github.io/FineGym/ - 论文:https://arxiv.org/abs/2004.06704 **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation** - 主页:https://anyirao.com/projects/SceneSeg.html - 论文下载链接:https://arxiv.org/abs/2004.02678 - 代码:https://github.com/AnyiRao/SceneSeg **Deep Homography Estimation for Dynamic Scenes** - 论文:https://arxiv.org/abs/2004.02132 - 数据集:https://github.com/lcmhoang/hmg-dynamics **Assessing Image Quality Issues for Real-World Problems** - 主页:https://vizwiz.org/tasks-and-datasets/image-quality-issues/ - 论文:https://arxiv.org/abs/2003.12511 **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** - 论文:https://arxiv.org/abs/2003.10608 - 代码和数据集:https://github.com/Jyouhou/UnrealText/ **PANDA: A Gigapixel-level Human-centric Video Dataset** - 论文:https://arxiv.org/abs/2003.04852 - 数据集:http://www.panda-dataset.com/ **IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning** - 论文:https://arxiv.org/abs/2003.02920 - 数据集:https://github.com/intra3d2019/IntrA **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS** - 论文:https://arxiv.org/abs/2003.03972 - 数据集:暂无 # 其他 **CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus** - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html - 代码:https://github.com/fkluger/consac **Learning to Learn Single Domain Generalization** - 论文:https://arxiv.org/abs/2003.13216 - 代码:https://github.com/joffery/M-ADA **Open Compound Domain Adaptation** - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing - 论文:https://arxiv.org/abs/1909.03403 - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA **Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision** - 论文:http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf - 代码:https://github.com/autonomousvision/differentiable_volumetric_rendering **QEBA: Query-Efficient Boundary-Based Blackbox Attack** - 论文:https://arxiv.org/abs/2005.14137 - 代码:https://github.com/AI-secure/QEBA **Equalization Loss for Long-Tailed Object Recognition** - 论文:https://arxiv.org/abs/2003.05176 - 代码:https://github.com/tztztztztz/eql.detectron2 **Instance-aware Image Colorization** - 主页:https://ericsujw.github.io/InstColorization/ - 论文:https://arxiv.org/abs/2005.10825 - 代码:https://github.com/ericsujw/InstColorization **Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting** - 论文:https://arxiv.org/abs/2005.09704 - 代码:https://github.com/Atlas200dk/sample-imageinpainting-HiFill **Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching** - 论文:https://arxiv.org/abs/2005.03860 - 代码:https://github.com/shiyujiao/cross_view_localization_DSM **Epipolar Transformers** - 论文:https://arxiv.org/abs/2005.04551 - 代码:https://github.com/yihui-he/epipolar-transformers **Bringing Old Photos Back to Life** - 主页:http://raywzy.com/Old_Photo/ - 论文:https://arxiv.org/abs/2004.09484 **MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask** - 论文:https://arxiv.org/abs/2003.10955 - 代码:https://github.com/microsoft/MaskFlownet **Self-Supervised Viewpoint Learning from Image Collections** - 论文:https://arxiv.org/abs/2004.01793 - 论文2:https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf - 代码:https://github.com/NVlabs/SSV **Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations** - Oral - 论文:https://arxiv.org/abs/2003.12237 - 代码:https://github.com/cuishuhao/BNM **Towards Learning Structure via Consensus for Face Segmentation and Parsing** - 论文:https://arxiv.org/abs/1911.00957 - 代码:https://github.com/isi-vista/structure_via_consensus **Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging** - Oral - 论文:https://arxiv.org/abs/2003.13654 - 代码:https://github.com/liuyang12/PnP-SCI **Lightweight Photometric Stereo for Facial Details Recovery** - 论文:https://arxiv.org/abs/2003.12307 - 代码:https://github.com/Juyong/FacePSNet **Footprints and Free Space from a Single Color Image** - 论文:https://arxiv.org/abs/2004.06376 - 代码:https://github.com/nianticlabs/footprints **Self-Supervised Monocular Scene Flow Estimation** - 论文:https://arxiv.org/abs/2004.04143 - 代码:https://github.com/visinf/self-mono-sf **Quasi-Newton Solver for Robust Non-Rigid Registration** - 论文:https://arxiv.org/abs/2004.04322 - 代码:https://github.com/Juyong/Fast_RNRR **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation** - 主页:https://anyirao.com/projects/SceneSeg.html - 论文下载链接:https://arxiv.org/abs/2004.02678 - 代码:https://github.com/AnyiRao/SceneSeg **DeepFLASH: An Efficient Network for Learning-based Medical Image Registration** - 论文:https://arxiv.org/abs/2004.02097 - 代码:https://github.com/jw4hv/deepflash **Self-Supervised Scene De-occlusion** - 主页:https://xiaohangzhan.github.io/projects/deocclusion/ - 论文:https://arxiv.org/abs/2004.02788 - 代码:https://github.com/XiaohangZhan/deocclusion **Polarized Reflection Removal with Perfect Alignment in the Wild** - 主页:https://leichenyang.weebly.com/project-polarized.html - 代码:https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment **Background Matting: The World is Your Green Screen** - 论文:https://arxiv.org/abs/2004.00626 - 代码:http://github.com/senguptaumd/Background-Matting **What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective** - 论文:https://arxiv.org/abs/2003.11241 - 代码:https://github.com/ZhangLi-CS/GCP_Optimization **Look-into-Object: Self-supervised Structure Modeling for Object Recognition** - 论文:暂无 - 代码:https://github.com/JDAI-CV/LIO **Video Object Grounding using Semantic Roles in Language Description** - 论文:https://arxiv.org/abs/2003.10606 - 代码:https://github.com/TheShadow29/vognet-pytorch **Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives** - 论文:https://arxiv.org/abs/2003.10739 - 代码:https://github.com/d-li14/DHM **SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization** - 论文:http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf - 代码:https://github.com/YueJiang-nj/CVPR2020-SDFDiff **On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location** - 论文:https://arxiv.org/abs/2003.07064 - 代码:https://github.com/oskyhn/CNNs-Without-Borders **GhostNet: More Features from Cheap Operations** - 论文:https://arxiv.org/abs/1911.11907 - 代码:https://github.com/iamhankai/ghostnet **AdderNet: Do We Really Need Multiplications in Deep Learning?** - 论文:https://arxiv.org/abs/1912.13200 - 代码:https://github.com/huawei-noah/AdderNet **Deep Image Harmonization via Domain Verification** - 论文:https://arxiv.org/abs/1911.13239 - 代码:https://github.com/bcmi/Image_Harmonization_Datasets **Blurry Video Frame Interpolation** - 论文:https://arxiv.org/abs/2002.12259 - 代码:https://github.com/laomao0/BIN **Extremely Dense Point Correspondences using a Learned Feature Descriptor** - 论文:https://arxiv.org/abs/2003.00619 - 代码:https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch **Filter Grafting for Deep Neural Networks** - 论文:https://arxiv.org/abs/2001.05868 - 代码:https://github.com/fxmeng/filter-grafting - 论文解读:https://www.zhihu.com/question/372070853/answer/1041569335 **Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation** - 论文:https://arxiv.org/abs/2003.02824 - 代码:https://github.com/cmhungsteve/SSTDA **Detecting Attended Visual Targets in Video** - 论文:https://arxiv.org/abs/2003.02501 - 代码:https://github.com/ejcgt/attention-target-detection **Deep Image Spatial Transformation for Person Image Generation** - 论文:https://arxiv.org/abs/2003.00696 - 代码:https://github.com/RenYurui/Global-Flow-Local-Attention **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications** - 论文:https://arxiv.org/abs/2003.01455 - 代码:https://github.com/bbrattoli/ZeroShotVideoClassification https://github.com/charlesCXK/3D-SketchAware-SSC https://github.com/Anonymous20192020/Anonymous_CVPR5767 https://github.com/avirambh/ScopeFlow https://github.com/csbhr/CDVD-TSP https://github.com/ymcidence/TBH https://github.com/yaoyao-liu/mnemonics https://github.com/meder411/Tangent-Images https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch https://github.com/sjmoran/deep_local_parametric_filters https://github.com/charlesCXK/3D-SketchAware-SSC https://github.com/bermanmaxim/AOWS https://github.com/dc3ea9f/look-into-object # 不确定中没中 **FADNet: A Fast and Accurate Network for Disparity Estimation** - 论文:还没出来 - 代码:https://github.com/HKBU-HPML/FADNet https://github.com/rFID-submit/RandomFID:不确定中没中 https://github.com/JackSyu/AE-MSR:不确定中没中 https://github.com/fastconvnets/cvpr2020:不确定中没中 https://github.com/aimagelab/meshed-memory-transformer:不确定中没中 https://github.com/TWSFar/CRGNet:不确定中没中 https://github.com/CVPR-2020/CDARTS:不确定中没中 https://github.com/anucvml/ddn-cvprw2020:不确定中没中 https://github.com/dl-model-recommend/model-trust:不确定中没中 https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior:不确定中没中 https://github.com/onetcvpr/O-Net:不确定中没中 https://github.com/502463708/Microcalcification_Detection:不确定中没中 https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine:不确定中没中 https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset:不确定中没中 https://github.com/cvpr-nonrigid/dataset:不确定中没中 https://github.com/theFool32/PPBA:不确定中没中 https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition ================================================ FILE: CVPR2021-Papers-with-Code.md ================================================ # CVPR 2021 论文和开源项目合集(Papers with Code) [CVPR 2021](http://cvpr2021.thecvf.com/) 论文和开源项目合集(papers with code)! CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt > 注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~ ![](CVer学术交流群.png) ## 【CVPR 2021 论文开源目录】 - [Best Paper](#Best-Paper) - [Backbone](#Backbone) - [NAS](#NAS) - [GAN](#GAN) - [VAE](#VAE) - [Visual Transformer](#Visual-Transformer) - [Regularization](#Regularization) - [SLAM](#SLAM) - [长尾分布(Long-Tailed)](#Long-Tailed) - [数据增广(Data Augmentation)](#DA) - [无监督/自监督(Self-Supervised)](#Un/Self-Supervised) - [半监督(Semi-Supervised)](#Semi-Supervised) - [胶囊网络(Capsule Network)](#Capsule-Network) - [图像分类(Image Classification](#Image-Classification) - [2D目标检测(Object Detection)](#Object-Detection) - [单/多目标跟踪(Object Tracking)](#Object-Tracking) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation) - [视频目标分割(Video-Object-Segmentation)](#VOS) - [交互式视频目标分割(Interactive-Video-Object-Segmentation)](#IVOS) - [显著性检测(Saliency Detection)](#Saliency-Detection) - [伪装物体检测(Camouflaged Object Detection)](#Camouflaged-Object-Detection) - [协同显著性检测(Co-Salient Object Detection)](#CoSOD) - [图像抠图(Image Matting)](#Matting) - [行人重识别(Person Re-identification)](#Re-ID) - [行人搜索(Person Search)](#Person-Search) - [视频理解/行为识别(Video Understanding)](#Video-Understanding) - [人脸识别(Face Recognition)](#Face-Recognition) - [人脸检测(Face Detection)](#Face-Detection) - [人脸活体检测(Face Anti-Spoofing)](#Face-Anti-Spoofing) - [Deepfake检测(Deepfake Detection)](#Deepfake-Detection) - [人脸年龄估计(Age-Estimation)](#Age-Estimation) - [人脸表情识别(Facial-Expression-Recognition)](#FER) - [Deepfakes](#Deepfakes) - [人体解析(Human Parsing)](#Human-Parsing) - [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation) - [动物姿态估计(Animal Pose Estimation)](#Animal-Pose-Estimation) - [手部姿态估计(Hand Pose Estimation)](#Hand-Pose-Estimation) - [Human Volumetric Capture](#Human-Volumetric-Capture) - [场景文本识别(Scene Text Recognition)](#Scene-Text-Recognition) - [图像压缩(Image Compression)](#Image-Compression) - [模型压缩/剪枝/量化](#Model-Compression) - [知识蒸馏(Knowledge Distillation)](#KD) - [超分辨率(Super-Resolution)](#Super-Resolution) - [去雾(Dehazing)](#Dehazing) - [图像恢复(Image Restoration)](#Image-Restoration) - [图像补全(Image Inpainting)](#Image-Inpainting) - [图像编辑(Image Editing)](#Image-Editing) - [图像描述(Image Captioning)](#Image-Captioning) - [字体生成(Font Generation)](#Font-Generation) - [图像匹配(Image Matching)](#Image-Matching) - [图像融合(Image Blending)](#Image-Blending) - [反光去除(Reflection Removal)](#Reflection-Removal) - [3D点云分类(3D Point Clouds Classification)](#3D-C) - [3D目标检测(3D Object Detection)](#3D-Object-Detection) - [3D语义分割(3D Semantic Segmentation)](#3D-Semantic-Segmentation) - [3D全景分割(3D Panoptic Segmentation)](#3D-Panoptic-Segmentation) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D点云配准(3D Point Cloud Registration)](#3D-PointCloud-Registration) - [3D点云补全(3D-Point-Cloud-Completion)](#3D-Point-Cloud-Completion) - [3D重建(3D Reconstruction)](#3D-Reconstruction) - [6D位姿估计(6D Pose Estimation)](#6D-Pose-Estimation) - [相机姿态估计(Camera Pose Estimation)](#Camera-Pose-Estimation) - [深度估计(Depth Estimation)](#Depth-Estimation) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [光流估计(Flow Estimation)](#Flow-Estimation) - [车道线检测(Lane Detection)](#Lane-Detection) - [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction) - [人群计数(Crowd Counting)](#Crowd-Counting) - [对抗样本(Adversarial-Examples)](#AE) - [图像检索(Image Retrieval)](#Image-Retrieval) - [视频检索(Video Retrieval)](#Video-Retrieval) - [跨模态检索(Cross-modal Retrieval)](#Cross-modal-Retrieval) - [Zero-Shot Learning](#Zero-Shot-Learning) - [联邦学习(Federated Learning)](#Federated-Learning) - [视频插帧(Video Frame Interpolation)](#Video-Frame-Interpolation) - [视觉推理(Visual Reasoning)](#Visual-Reasoning) - [图像合成(Image Synthesis)](#Image-Synthesis) - [视图合成(Visual Synthesis)](#Visual-Synthesis) - [风格迁移(Style Transfer)](#Style-Transfer) - [布局生成(Layout Generation)](#Layout-Generation) - [Domain Generalization](#Domain-Generalization) - [Domain Adaptation](#Domain-Adaptation) - [Open-Set](#Open-Set) - [Adversarial Attack](#Adversarial-Attack) - ["人-物"交互(HOI)检测](#HOI) - [阴影去除(Shadow Removal)](#Shadow-Removal) - [虚拟试衣(Virtual Try-On)](#Virtual-Try-On) - [标签噪声(Label Noise)](#Label-Noise) - [视频稳像(Video Stabilization)](#Video-Stabilization) - [数据集(Datasets)](#Datasets) - [其他(Others)](#Others) - [待添加(TODO)](#TO-DO) - [不确定中没中(Not Sure)](#Not-Sure) # Best Paper **GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields** - Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html - Paper(Oral): https://arxiv.org/abs/2011.12100 - Code: https://github.com/autonomousvision/giraffe - Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1 # Backbone **HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers** - Paper(Oral): https://arxiv.org/abs/2106.06560 - Code: https://github.com/dingmyu/HR-NAS **BCNet: Searching for Network Width with Bilaterally Coupled Network** - Paper: https://arxiv.org/abs/2105.10533 - Code: None **Decoupled Dynamic Filter Networks** - Homepage: https://thefoxofsky.github.io/project_pages/ddf - Paper: https://arxiv.org/abs/2104.14107 - Code: https://github.com/thefoxofsky/DDF **Lite-HRNet: A Lightweight High-Resolution Network** - Paper: https://arxiv.org/abs/2104.06403 - https://github.com/HRNet/Lite-HRNet **CondenseNet V2: Sparse Feature Reactivation for Deep Networks** - Paper: https://arxiv.org/abs/2104.04382 - Code: https://github.com/jianghaojun/CondenseNetV2 **Diverse Branch Block: Building a Convolution as an Inception-like Unit** - Paper: https://arxiv.org/abs/2103.13425 - Code: https://github.com/DingXiaoH/DiverseBranchBlock **Scaling Local Self-Attention For Parameter Efficient Visual Backbones** - Paper(Oral): https://arxiv.org/abs/2103.12731 - Code: None **ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network** - Paper: https://arxiv.org/abs/2007.00992 - Code: https://github.com/clovaai/rexnet **Involution: Inverting the Inherence of Convolution for Visual Recognition** - Paper: https://github.com/d-li14/involution - Code: https://arxiv.org/abs/2103.06255 **Coordinate Attention for Efficient Mobile Network Design** - Paper: https://arxiv.org/abs/2103.02907 - Code: https://github.com/Andrew-Qibin/CoordAttention **Inception Convolution with Efficient Dilation Search** - Paper: https://arxiv.org/abs/2012.13587 - Code: https://github.com/yifan123/IC-Conv **RepVGG: Making VGG-style ConvNets Great Again** - Paper: https://arxiv.org/abs/2101.03697 - Code: https://github.com/DingXiaoH/RepVGG # NAS **HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers** - Paper(Oral): https://arxiv.org/abs/2106.06560 - Code: https://github.com/dingmyu/HR-NAS **BCNet: Searching for Network Width with Bilaterally Coupled Network** - Paper: https://arxiv.org/abs/2105.10533 - Code: None **ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search** - Paper: ttps://arxiv.org/abs/2105.10154 - Code: None **Combined Depth Space based Architecture Search For Person Re-identification** - Paper: https://arxiv.org/abs/2104.04163 - Code: None **DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation** - Paper(Oral): https://arxiv.org/abs/2103.15954 - Code: None **HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers** - Paper(Oral): None - Code: https://github.com/dingmyu/HR-NAS **Neural Architecture Search with Random Labels** - Paper: https://arxiv.org/abs/2101.11834 - Code: None **Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search** - Paper: https://arxiv.org/abs/2101.11342 - Code: None **Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation** - Paper: https://arxiv.org/abs/2105.12971 - Code: None **Prioritized Architecture Sampling with Monto-Carlo Tree Search** - Paper: https://arxiv.org/abs/2103.11922 - Code: https://github.com/xiusu/NAS-Bench-Macro **Contrastive Neural Architecture Search with Neural Architecture Comparators** - Paper: https://arxiv.org/abs/2103.05471 - Code: https://github.com/chenyaofo/CTNAS **AttentiveNAS: Improving Neural Architecture Search via Attentive** - Paper: https://arxiv.org/abs/2011.09011 - Code: None **ReNAS: Relativistic Evaluation of Neural Architecture Search** - Paper: https://arxiv.org/abs/1910.01523 - Code: None **HourNAS: Extremely Fast Neural Architecture** - Paper: https://arxiv.org/abs/2005.14446 - Code: None **Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator** - Paper: https://arxiv.org/abs/2103.07289 - Code: https://github.com/eric8607242/SGNAS **OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection** - Paper: https://arxiv.org/abs/2103.04507 - Code: https://github.com/VDIGPKU/OPANAS **Inception Convolution with Efficient Dilation Search** - Paper: https://arxiv.org/abs/2012.13587 - Code: None # GAN **High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network** - Paper: https://arxiv.org/abs/2105.09188 - Code: https://github.com/csjliang/LPTN - Dataset: https://github.com/csjliang/LPTN **DG-Font: Deformable Generative Networks for Unsupervised Font Generation** - Paper: https://arxiv.org/abs/2104.03064 - Code: https://github.com/ecnuycxie/DG-Font **PD-GAN: Probabilistic Diverse GAN for Image Inpainting** - Paper: https://arxiv.org/abs/2105.02201 - Code: https://github.com/KumapowerLIU/PD-GAN **StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: https://arxiv.org/abs/2104.14754 - Code: https://github.com/naver-ai/StyleMapGAN - Demo Video: https://youtu.be/qCapNyRA_Ng **Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer** - Paper: https://arxiv.org/abs/2104.05376 - Code: https://github.com/PaddlePaddle/PaddleGAN/ **Regularizing Generative Adversarial Networks under Limited Data** - Homepage: https://hytseng0509.github.io/lecam-gan/ - Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf - Code: https://github.com/google/lecam-gan **Towards Real-World Blind Face Restoration with Generative Facial Prior** - Paper: https://arxiv.org/abs/2101.04061 - Code: None **TediGAN: Text-Guided Diverse Image Generation and Manipulation** - Homepage: https://xiaweihao.com/projects/tedigan/ - Paper: https://arxiv.org/abs/2012.03308 - Code: https://github.com/weihaox/TediGAN **Generative Hierarchical Features from Synthesizing Image** - Homepage: https://genforce.github.io/ghfeat/ - Paper(Oral): https://arxiv.org/abs/2007.10379 - Code: https://github.com/genforce/ghfeat **Teachers Do More Than Teach: Compressing Image-to-Image Models** - Paper: https://arxiv.org/abs/2103.03467 - Code: https://github.com/snap-research/CAT **HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms** - Paper: https://arxiv.org/abs/2011.11731 - Code: https://github.com/mahmoudnafifi/HistoGAN **pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis** - Homepage: https://marcoamonteiro.github.io/pi-GAN-website/ - Paper(Oral): https://arxiv.org/abs/2012.00926 - Code: None **DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network** - Paper: https://arxiv.org/abs/2103.07893 - Code: None **Diverse Semantic Image Synthesis via Probability Distribution Modeling** - Paper: https://arxiv.org/abs/2103.06878 - Code: https://github.com/tzt101/INADE.git **LOHO: Latent Optimization of Hairstyles via Orthogonalization** - Paper: https://arxiv.org/abs/2103.03891 - Code: None **PISE: Person Image Synthesis and Editing with Decoupled GAN** - Paper: https://arxiv.org/abs/2103.04023 - Code: https://github.com/Zhangjinso/PISE **DeFLOCNet: Deep Image Editing via Flexible Low-level Controls** - Paper: http://raywzy.com/ - Code: http://raywzy.com/ **PD-GAN: Probabilistic Diverse GAN for Image Inpainting** - Paper: http://raywzy.com/ - Code: http://raywzy.com/ **Efficient Conditional GAN Transfer with Knowledge Propagation across Classes** - Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes - Code: http://github.com/mshahbazi72/cGANTransfer **Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: None - Code: None **Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs** - Paper: https://arxiv.org/abs/2011.14107 - Code: None **Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation** - Homepage: https://eladrich.github.io/pixel2style2pixel/ - Paper: https://arxiv.org/abs/2008.00951 - Code: https://github.com/eladrich/pixel2style2pixel **A 3D GAN for Improved Large-pose Facial Recognition** - Paper: https://arxiv.org/abs/2012.10545 - Code: None **HumanGAN: A Generative Model of Humans Images** - Paper: https://arxiv.org/abs/2103.06902 - Code: None **ID-Unet: Iterative Soft and Hard Deformation for View Synthesis** - Paper: https://arxiv.org/abs/2103.02264 - Code: https://github.com/MingyuY/Iterative-view-synthesis **CoMoGAN: continuous model-guided image-to-image translation** - Paper(Oral): https://arxiv.org/abs/2103.06879 - Code: https://github.com/cv-rits/CoMoGAN **Training Generative Adversarial Networks in One Stage** - Paper: https://arxiv.org/abs/2103.00430 - Code: None **Closed-Form Factorization of Latent Semantics in GANs** - Homepage: https://genforce.github.io/sefa/ - Paper(Oral): https://arxiv.org/abs/2007.06600 - Code: https://github.com/genforce/sefa **Anycost GANs for Interactive Image Synthesis and Editing** - Paper: https://arxiv.org/abs/2103.03243 - Code: https://github.com/mit-han-lab/anycost-gan **Image-to-image Translation via Hierarchical Style Disentanglement** - Paper: https://arxiv.org/abs/2103.01456 - Code: https://github.com/imlixinyang/HiSD # VAE **Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders** - Homepage: https://taldatech.github.io/soft-intro-vae-web/ - Paper: https://arxiv.org/abs/2012.13253 - Code: https://github.com/taldatech/soft-intro-vae-pytorch # Visual Transformer **1. End-to-End Human Pose and Mesh Reconstruction with Transformers** - Paper: https://arxiv.org/abs/2012.09760 - Code: https://github.com/microsoft/MeshTransformer **2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition** - Paper: https://arxiv.org/abs/2101.06184 - Code: https://github.com/tobyperrett/trx **3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain** - Paper: https://arxiv.org/abs/2103.16110 - Code: https://github.com/mczhuge/Kaleido-BERT **4. HOTR: End-to-End Human-Object Interaction Detection with Transformers** - Paper: https://arxiv.org/abs/2104.13682 - Code: https://github.com/kakaobrain/HOTR **5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving** - Paper: https://arxiv.org/abs/2104.09224 - Code: https://github.com/autonomousvision/transfuser **6. Pose Recognition with Cascade Transformers** - Paper: https://arxiv.org/abs/2104.06976 - Code: https://github.com/mlpc-ucsd/PRTR **7. Variational Transformer Networks for Layout Generation** - Paper: https://arxiv.org/abs/2104.02416 - Code: None **8. LoFTR: Detector-Free Local Feature Matching with Transformers** - Homepage: https://zju3dv.github.io/loftr/ - Paper: https://arxiv.org/abs/2104.00680 - Code: https://github.com/zju3dv/LoFTR **9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers** - Paper: https://arxiv.org/abs/2012.15840 - Code: https://github.com/fudan-zvg/SETR **10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers** - Paper: https://arxiv.org/abs/2103.16553 - Code: None **11. Transformer Tracking** - Paper: https://arxiv.org/abs/2103.15436 - Code: https://github.com/chenxin-dlut/TransT **12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers** - Paper(Oral): https://arxiv.org/abs/2106.06560 - Code: https://github.com/dingmyu/HR-NAS **13. MIST: Multiple Instance Spatial Transformer** - Paper: https://arxiv.org/abs/1811.10725 - Code: None **14. Multimodal Motion Prediction with Stacked Transformers** - Paper: https://arxiv.org/abs/2103.11624 - Code: https://decisionforce.github.io/mmTransformer **15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning** - Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning - Code: https://github.com/amzn/image-to-recipe-transformers **16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking** - Paper(Oral): https://arxiv.org/abs/2103.11681 - Code: https://github.com/594422814/TransformerTrack **17. Pre-Trained Image Processing Transformer** - Paper: https://arxiv.org/abs/2012.00364 - Code: None **18. End-to-End Video Instance Segmentation with Transformers** - Paper(Oral): https://arxiv.org/abs/2011.14503 - Code: https://github.com/Epiphqny/VisTR **19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers** - Paper(Oral): https://arxiv.org/abs/2011.09094 - Code: https://github.com/dddzg/up-detr **20. End-to-End Human Object Interaction Detection with HOI Transformer** - Paper: https://arxiv.org/abs/2103.04503 - Code: https://github.com/bbepoch/HoiTransformer **21. Transformer Interpretability Beyond Attention Visualization** - Paper: https://arxiv.org/abs/2012.09838 - Code: https://github.com/hila-chefer/Transformer-Explainability **22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer** - Paper: None - Code: None **23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity** - Paper: None - Code: None **24. Line Segment Detection Using Transformers without Edges** - Paper(Oral): https://arxiv.org/abs/2101.01909 - Code: None **25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers** - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html - Code: None **26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation** - Paper(Oral): https://arxiv.org/abs/2101.08833 - Code: https://github.com/dukebw/SSTVOS **27. Facial Action Unit Detection With Transformers** - Paper: None - Code: None **28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition** - Paper: None - Code: None **29. Lesion-Aware Transformers for Diabetic Retinopathy Grading** - Paper: None - Code: None **30. Topological Planning With Transformers for Vision-and-Language Navigation** - Paper: https://arxiv.org/abs/2012.05292 - Code: None **31. Adaptive Image Transformer for One-Shot Object Detection** - Paper: None - Code: None **32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos** - Paper: None - Code: None **33. Taming Transformers for High-Resolution Image Synthesis** - Homepage: https://compvis.github.io/taming-transformers/ - Paper(Oral): https://arxiv.org/abs/2012.09841 - Code: https://github.com/CompVis/taming-transformers **34. Self-Supervised Video Hashing via Bidirectional Transformers** - Paper: None - Code: None **35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos** - Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf - Code: None **36. Gaussian Context Transformer** - Paper: None - Code: None **37. General Multi-Label Image Classification With Transformers** - Paper: https://arxiv.org/abs/2011.14027 - Code: None **38. Bottleneck Transformers for Visual Recognition** - Paper: https://arxiv.org/abs/2101.11605 - Code: None **39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation** - Paper(Oral): https://arxiv.org/abs/2011.13922 - Code: https://github.com/YicongHong/Recurrent-VLN-BERT **40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling** - Paper(Oral): https://arxiv.org/abs/2102.06183 - Code: https://github.com/jayleicn/ClipBERT **41. Self-attention based Text Knowledge Mining for Text Detection** - Paper: None - Code: https://github.com/CVI-SZU/STKM **42. SSAN: Separable Self-Attention Network for Video Representation Learning** - Paper: None - Code: None **43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones** - Paper(Oral): https://arxiv.org/abs/2103.12731 - Code: None # Regularization **Regularizing Neural Networks via Adversarial Model Perturbation** - Paper: https://arxiv.org/abs/2010.04925 - Code: https://github.com/hiyouga/AMP-Regularizer # SLAM **Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation** - Paper: https://arxiv.org/abs/2105.07593 - Code: None **Generalizing to the Open World: Deep Visual Odometry with Online Adaptation** - Paper: https://arxiv.org/abs/2103.15279 - Code: https://arxiv.org/abs/2103.15279 # 长尾分布(Long-Tailed) **Adversarial Robustness under Long-Tailed Distribution** - Paper(Oral): https://arxiv.org/abs/2104.02703 - Code: https://github.com/wutong16/Adversarial_Long-Tail **Distribution Alignment: A Unified Framework for Long-tail Visual Recognition** - Paper: https://arxiv.org/abs/2103.16370 - Code: https://github.com/Megvii-BaseDetection/DisAlign **Adaptive Class Suppression Loss for Long-Tail Object Detection** - Paper: https://arxiv.org/abs/2104.00885 - Code: https://github.com/CASIA-IVA-Lab/ACSL **Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification** - Paper: https://arxiv.org/abs/2103.14267 - Code: None # 数据增广(Data Augmentation) **Scale-aware Automatic Augmentation for Object Detection** - Paper: https://arxiv.org/abs/2103.17220 - Code: https://github.com/Jia-Research-Lab/SA-AutoAug # 无监督/自监督(Un/Self-Supervised) **Domain-Specific Suppression for Adaptive Object Detection** - Paper: https://arxiv.org/abs/2105.03570 - Code: None **A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning** - Paper: https://arxiv.org/abs/2104.14558 - Code: https://github.com/facebookresearch/SlowFast **Unsupervised Multi-Source Domain Adaptation for Person Re-Identification** - Paper: https://arxiv.org/abs/2104.12961 - Code: None **Self-supervised Video Representation Learning by Context and Motion Decoupling** - Paper: https://arxiv.org/abs/2104.00862 - Code: None **Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning** - Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html - Paper: https://arxiv.org/abs/2009.05769 - Code: https://github.com/FingerRec/BE **Spatially Consistent Representation Learning** - Paper: https://arxiv.org/abs/2103.06122 - Code: None **VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples** - Paper: https://arxiv.org/abs/2103.05905 - Code: https://github.com/tinapan-pt/VideoMoCo **Exploring Simple Siamese Representation Learning** - Paper(Oral): https://arxiv.org/abs/2011.10566 - Code: None **Dense Contrastive Learning for Self-Supervised Visual Pre-Training** - Paper(Oral): https://arxiv.org/abs/2011.09157 - Code: https://github.com/WXinlong/DenseCL # 半监督学习(Semi-Supervised ) **Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework** - 作者单位: 阿里巴巴 - Paper: https://arxiv.org/abs/2103.11402 - Code: None **Adaptive Consistency Regularization for Semi-Supervised Transfer Learning** - Paper: https://arxiv.org/abs/2103.02193 - Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning # 胶囊网络(Capsule Network) **Capsule Network is Not More Robust than Convolutional Network** - Paper: https://arxiv.org/abs/2103.15459 - Code: None # 图像分类(Image Classification) **Correlated Input-Dependent Label Noise in Large-Scale Image Classification** - Paper(Oral): https://arxiv.org/abs/2105.10305 - Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet # 2D目标检测(Object Detection) ## 2D目标检测 **1. Scaled-YOLOv4: Scaling Cross Stage Partial Network** - 作者单位: 中央研究院, 英特尔, 静宜大学 - Paper: https://arxiv.org/abs/2011.08036 - Code: https://github.com/WongKinYiu/ScaledYOLOv4 - 中文解读: [YOLOv4官方改进版来了!55.8% AP!速度最高达1774 FPS,Scaled-YOLOv4正式开源!](https://mp.weixin.qq.com/s/AcrJPNoAVhn8cGBUGK7ekA) **2. You Only Look One-level Feature** - 作者单位: 中科院, 国科大, 旷视科技 - Paper: https://arxiv.org/abs/2103.09460 - Code: https://github.com/megvii-model/YOLOF - 中文解读: [CVPR 2021 | 没有FPN!中科院&旷视提出YOLOF:你只需看一层特征](https://mp.weixin.qq.com/s/EJqAG1gTVaP2icI6QL742A) **3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals** - 作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校 - Paper: https://arxiv.org/abs/2011.12450 - Code: https://github.com/PeizeSun/SparseR-CNN - 中文解读: [目标检测新范式!港大同济伯克利提出Sparse R-CNN,代码刚刚开源!](https://mp.weixin.qq.com/s/P2Zgh1wTqf8L2976El5nfQ) **4. End-to-End Object Detection with Fully Convolutional Network** - 作者单位: 旷视科技, 西安交通大学 - Paper: https://arxiv.org/abs/2012.03544 - Code: https://github.com/Megvii-BaseDetection/DeFCN **5. Dynamic Head: Unifying Object Detection Heads with Attentions** - 作者单位: 微软 - Paper: https://arxiv.org/abs/2106.08322 - Code: https://github.com/microsoft/DynamicHead - 中文解读: [60.6 AP!打破COCO记录!微软提出DyHead:将注意力与目标检测Heads统一](https://mp.weixin.qq.com/s/uYPUqVXwNau71VAYW3bYIA) **6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection** - 作者单位: 南京理工大学, Momenta, 南京大学, 清华大学 - Paper: https://arxiv.org/abs/2011.12885 - Code: https://github.com/implus/GFocalV2 - 中文解读:[CVPR 2021 | GFLV2:目标检测良心技术,无Cost涨点!](https://mp.weixin.qq.com/s/JB7k3NwXU-cDueg6w9mghQ) **7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers** - 作者单位: 华南理工大学, 腾讯微信AI - Paper(Oral): https://arxiv.org/abs/2011.09094 - Code: https://github.com/dddzg/up-detr - 中文解读: [CVPR 2021 Oral | Transformer再发力!华南理工和微信提出UP-DETR:无监督预训练检测器](https://mp.weixin.qq.com/s/Hprp7B16SGFhVEKXfKiRBQ) **8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators** - 作者单位: 威斯康星大学, 谷歌 - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf - Code: https://github.com/tensorflow/models/tree/master/research/object_detection **9. Tracking Pedestrian Heads in Dense Crowd** - 作者单位: 雷恩第一大学 - Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html - Code1: https://github.com/Sentient07/HeadHunter - Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T - Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ **10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation** - 作者单位: 香港科技大学, 华为诺亚 - Paper: https://arxiv.org/abs/2105.12971 - Code: None **11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery** - 作者单位: A*star, 四川大学, 南洋理工大学 - Paper: https://arxiv.org/abs/2105.12990 - Code: None **12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection** - 作者单位: 旷视科技 - Paper: https://arxiv.org/abs/2104.06936 - Code: None **13. Multi-Scale Aligned Distillation for Low-Resolution Detection** - 作者单位: 香港中文大学, Adobe研究院, 思谋科技 - Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf - Code: https://github.com/Jia-Research-Lab/MSAD **14. Adaptive Class Suppression Loss for Long-Tail Object Detection** - 作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise - Paper: https://arxiv.org/abs/2104.00885 - Code: https://github.com/CASIA-IVA-Lab/ACSL **15. VarifocalNet: An IoU-aware Dense Object Detector** - 作者单位: 昆士兰科技大学, 昆士兰大学 - Paper(Oral): https://arxiv.org/abs/2008.13367 - Code: https://github.com/hyz-xmaster/VarifocalNet **16. OTA: Optimal Transport Assignment for Object Detection** - 作者单位: 早稻田大学, 旷视科技 - Paper: https://arxiv.org/abs/2103.14259 - Code: https://github.com/Megvii-BaseDetection/OTA **17. Distilling Object Detectors via Decoupled Features** - 作者单位: 华为诺亚, 悉尼大学 - Paper: https://arxiv.org/abs/2103.14475 - Code: https://github.com/ggjy/DeFeat.pytorch **18. Robust and Accurate Object Detection via Adversarial Learning** - 作者单位: 谷歌, UCLA, UCSC - Paper: https://arxiv.org/abs/2103.13886 - Code: None **19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection** - 作者单位: 北京大学, Anyvision, 石溪大学 - Paper: https://arxiv.org/abs/2103.04507 - Code: https://github.com/VDIGPKU/OPANAS **20. Multiple Instance Active Learning for Object Detection** - 作者单位: 国科大, 华为诺亚, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf - Code: https://github.com/yuantn/MI-AOD **21. Towards Open World Object Detection** - 作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学 - Paper(Oral): https://arxiv.org/abs/2103.02603 - Code: https://github.com/JosephKJ/OWOD **22. RankDetNet: Delving Into Ranking Constraints for Object Detection** - 作者单位: 赛灵思 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_RankDetNet_Delving_Into_Ranking_Constraints_for_Object_Detection_CVPR_2021_paper.html - Code: None ## 旋转目标检测 **23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection** - 作者单位: 上海交通大学, 国科大 - Paper: https://arxiv.org/abs/2011.09670 - Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow - Code2: https://github.com/yangxue0827/RotationDetection **24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection** - 作者单位: 武汉大学 - Paper: https://arxiv.org/abs/2103.07733 - Code: https://github.com/csuhan/ReDet **25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection** - 作者单位: 国科大, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Beyond_Bounding-Box_Convex-Hull_Feature_Adaptation_for_Oriented_and_Densely_Packed_CVPR_2021_paper.html - Code: https://github.com/SDL-GuoZonghao/BeyondBoundingBox ## Few-Shot目标检测 **26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss** - 作者单位: 复旦大学, 同济大学, 浙江大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html - Code: None **27. Adaptive Image Transformer for One-Shot Object Detection** - 作者单位: 中央研究院, 台湾AI Labs - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Adaptive_Image_Transformer_for_One-Shot_Object_Detection_CVPR_2021_paper.html - Code: None **28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection** - 作者单位: 北京大学, 北邮 - Paper: https://arxiv.org/abs/2103.17115 - Code: https://github.com/hzhupku/DCNet **29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection** - 作者单位: 卡内基梅隆大学(CMU) - Paper: https://arxiv.org/abs/2103.01903 - Code: None **30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding** - 作者单位: 南加利福尼亚大学, 旷视科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sun_FSCE_Few-Shot_Object_Detection_via_Contrastive_Proposal_Encoding_CVPR_2021_paper.html - Code: https://github.com/MegviiDetection/FSCE **31. Hallucination Improves Few-Shot Object Detection** - 作者单位: 伊利诺伊大学厄巴纳-香槟分校 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Hallucination_Improves_Few-Shot_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/pppplin/HallucFsDet **32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment** - 作者单位: 新加坡国立大学, SIMTech - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Few-Shot_Object_Detection_via_Classification_Refinement_and_Distractor_Retreatment_CVPR_2021_paper.html - Code: None **33. Generalized Few-Shot Object Detection Without Forgetting** - 作者单位: 旷视科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Fan_Generalized_Few-Shot_Object_Detection_Without_Forgetting_CVPR_2021_paper.html - Code: None **34. Transformation Invariant Few-Shot Object Detection** - 作者单位: 华为诺亚方舟实验室 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html - Code: None **35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation** - 作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Khandelwal_UniT_Unified_Knowledge_Transfer_for_Any-Shot_Object_Detection_and_Segmentation_CVPR_2021_paper.html - Code: https://github.com/ubc-vision/UniT **36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection** - 作者单位: 国科大, 厦门大学, 鹏城实验室 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Beyond_Max-Margin_Class_Margin_Equilibrium_for_Few-Shot_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/Bohao-Lee/CME ## 半监督目标检测 **37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]** - 作者单位: 旷视科技, 复旦大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Points_As_Queries_Weakly_Semi-Supervised_Object_Detection_by_Points_CVPR_2021_paper.html - Code: None **38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection** - 作者单位: 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Data-Uncertainty_Guided_Multi-Phase_Learning_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html - Code: None **39. Positive-Unlabeled Data Purification in the Wild for Object Detection** - 作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html - Code: None **40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection** - 作者单位: 阿里巴巴, 香港理工大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Interactive_Self-Training_With_Mean_Teachers_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html - Code: None **41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework** - 作者单位: 阿里巴巴 - Paper: https://arxiv.org/abs/2103.11402 - Code: None **42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection** - 作者单位: 卡内基梅隆大学(CMU), 亚马逊 - Homepage: https://yihet.com/humble-teacher - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tang_Humble_Teachers_Teach_Better_Students_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/lryta/HumbleTeacher **43. Interpolation-Based Semi-Supervised Learning for Object Detection** - 作者单位: 首尔大学, 阿尔托大学等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Jeong_Interpolation-Based_Semi-Supervised_Learning_for_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/soo89/ISD-SSD # 域自适应目标检测 **44. Domain-Specific Suppression for Adaptive Object Detection** - 作者单位: 中科院, 寒武纪, 国科大 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Domain-Specific_Suppression_for_Adaptive_Object_Detection_CVPR_2021_paper.html - Code: None **45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection** - 作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰 - Paper: https://arxiv.org/abs/2103.04224 - Code: None **46. Unbiased Mean Teacher for Cross-Domain Object Detection** - 作者单位: 电子科技大学, ETH Zurich - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Deng_Unbiased_Mean_Teacher_for_Cross-Domain_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/kinredon/umt **47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors** - 作者单位: 香港大学, 厦门大学, Deepwise AI Lab - Paper: https://arxiv.org/abs/2103.13757 - Code: None ## 自监督目标检测 **48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge** - 作者单位: 弗莱堡大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Valverde_There_Is_More_Than_Meets_the_Eye_Self-Supervised_Multi-Object_Detection_CVPR_2021_paper.html - Code: http://rl.uni-freiburg.de/research/multimodal-distill **49. Instance Localization for Self-supervised Detection Pretraining** - 作者单位: 香港中文大学, 微软亚洲研究院 - Paper: https://arxiv.org/abs/2102.08318 - Code: https://github.com/limbo0000/InstanceLoc ## 弱监督目标检测 **50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection** - 作者单位: 北航, 鹏城实验室, 商汤科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Informative_and_Consistent_Correspondence_Mining_for_Cross-Domain_Weakly_Supervised_Object_CVPR_2021_paper.html - Code: None **51. DAP: Detection-Aware Pre-training with Weak Supervision** - 作者单位: UIUC, 微软 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhong_DAP_Detection-Aware_Pre-Training_With_Weak_Supervision_CVPR_2021_paper.html - Code: None ## 其他 **52. Open-Vocabulary Object Detection Using Captions** - 作者单位:Snap, 哥伦比亚大学 - Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html - Code: https://github.com/alirezazareian/ovr-cnn **53. Depth From Camera Motion and Object Detection** - 作者单位: 密歇根大学, SIAI - Paper: https://arxiv.org/abs/2103.01468 - Code: https://github.com/griffbr/ODMD - Dataset: https://github.com/griffbr/ODMD **54. Unsupervised Object Detection With LIDAR Clues** - 作者单位: 商汤科技, 国科大, 中科大 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tian_Unsupervised_Object_Detection_With_LIDAR_Clues_CVPR_2021_paper.html - Code: None **55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs** - 作者单位: 国科大, 北理, 中科院, 商汤科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Bu_GAIA_A_Transfer_Learning_System_of_Object_Detection_That_Fits_CVPR_2021_paper.html - Code: https://github.com/GAIA-vision/GAIA-det **56. General Instance Distillation for Object Detection** - 作者单位: 旷视科技, 北航 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dai_General_Instance_Distillation_for_Object_Detection_CVPR_2021_paper.html - Code: None **57. AQD: Towards Accurate Quantized Object Detection** - 作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_AQD_Towards_Accurate_Quantized_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/aim-uofa/model-quantization **58. Scale-Aware Automatic Augmentation for Object Detection** - 作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scale-Aware_Automatic_Augmentation_for_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/Jia-Research-Lab/SA-AutoAug **59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection** - 作者单位: 同济大学, 商汤科技, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tan_Equalization_Loss_v2_A_New_Gradient_Balance_Approach_for_Long-Tailed_CVPR_2021_paper.html - Code: https://github.com/tztztztztz/eqlv2 **60. Class-Aware Robust Adversarial Training for Object Detection** - 作者单位: 哥伦比亚大学, 中央研究院 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Class-Aware_Robust_Adversarial_Training_for_Object_Detection_CVPR_2021_paper.html - Code: None **61. Improved Handling of Motion Blur in Online Object Detection** - 作者单位: 伦敦大学学院 - Homepage: http://visual.cs.ucl.ac.uk/pubs/handlingMotionBlur/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sayed_Improved_Handling_of_Motion_Blur_in_Online_Object_Detection_CVPR_2021_paper.html - Code: None **62. Multiple Instance Active Learning for Object Detection** - 作者单位: 国科大, 华为诺亚 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.html - Code: https://github.com/yuantn/MI-AOD **63. Neural Auto-Exposure for High-Dynamic Range Object Detection** - 作者单位: Algolux, 普林斯顿大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html - Code: None **64. Generalizable Pedestrian Detection: The Elephant in the Room** - 作者单位: IIAI, 阿尔托大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hasan_Generalizable_Pedestrian_Detection_The_Elephant_in_the_Room_CVPR_2021_paper.html - Code: https://github.com/hasanirtiza/Pedestron **65. Neural Auto-Exposure for High-Dynamic Range Object Detection** - 作者单位: Algolux, 普林斯顿大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html - Code: None # 单/多目标跟踪(Object Tracking) ## 单目标跟踪 **LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search** - Paper: https://arxiv.org/abs/2104.14545 - Code: https://github.com/researchmm/LightTrack **Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark** - Homepage: https://sites.google.com/view/langtrackbenchmark/ - Paper: https://arxiv.org/abs/2103.16746 - Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit - Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang **IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking** - Paper: https://arxiv.org/abs/2103.14938 - Code: https://github.com/VISION-SJTU/IoUattack **Graph Attention Tracking** - Paper: https://arxiv.org/abs/2011.11204 - Code: https://github.com/ohhhyeahhh/SiamGAT **Rotation Equivariant Siamese Networks for Tracking** - Paper: https://arxiv.org/abs/2012.13078 - Code: None **Track to Detect and Segment: An Online Multi-Object Tracker** - Homepage: https://jialianwu.com/projects/TraDeS.html - Paper: None - Code: None **Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking** - Paper(Oral): https://arxiv.org/abs/2103.11681 - Code: https://github.com/594422814/TransformerTrack **Transformer Tracking** - Paper: https://arxiv.org/abs/2103.15436 - Code: https://github.com/chenxin-dlut/TransT ## 多目标跟踪 **Tracking Pedestrian Heads in Dense Crowd** - Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html - Code1: https://github.com/Sentient07/HeadHunter - Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T - Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ **Multiple Object Tracking with Correlation Learning** - Paper: https://arxiv.org/abs/2104.03541 - Code: None **Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking** - Paper: https://arxiv.org/abs/2012.02337 - Code: None **Learning a Proposal Classifier for Multiple Object Tracking** - Paper: https://arxiv.org/abs/2103.07889 - Code: https://github.com/daip13/LPC_MOT.git **Track to Detect and Segment: An Online Multi-Object Tracker** - Homepage: https://jialianwu.com/projects/TraDeS.html - Paper: https://arxiv.org/abs/2103.08808 - Code: https://github.com/JialianW/TraDeS # 语义分割(Semantic Segmentation) **1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation** - 作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学 - Homepage: https://nirkin.com/hyperseg/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf - Code: https://github.com/YuvalNirkin/hyperseg **2. Rethinking BiSeNet For Real-time Semantic Segmentation** - 作者单位: 美团 - Paper: https://arxiv.org/abs/2104.13188 - Code: https://github.com/MichaelFan01/STDC-Seg **3. Progressive Semantic Segmentation** - 作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学 - Paper: https://arxiv.org/abs/2104.03778 - Code: https://github.com/VinAIResearch/MagNet **4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers** - 作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI - Homepage: https://fudan-zvg.github.io/SETR - Paper: https://arxiv.org/abs/2012.15840 - Code: https://github.com/fudan-zvg/SETR **5. Capturing Omni-Range Context for Omnidirectional Segmentation** - 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为 - Paper: https://arxiv.org/abs/2103.05687 - Code: None **6. Learning Statistical Texture for Semantic Segmentation** - 作者单位: 北航, 商汤科技 - Paper: https://arxiv.org/abs/2103.04133 - Code: None **7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation** - 作者单位: 高通AI研究院 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Borse_InverseForm_A_Loss_Function_for_Structured_Boundary-Aware_Segmentation_CVPR_2021_paper.html - Code: None **8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation** - 作者单位: Joyy Inc, 快手, 北航等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_DCNAS_Densely_Connected_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.html - Code: None ## 弱监督语义分割 **9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation** - 作者单位: 延世大学, 成均馆大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lee_Railroad_Is_Not_a_Train_Saliency_As_Pseudo-Pixel_Supervision_for_CVPR_2021_paper.html - Code: https://github.com/halbielee/EPS **10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation** - 作者单位: 延世大学 - Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/ - Paper: https://arxiv.org/abs/2104.00905 - Code: None **11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation** - 作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学 - Paper: https://arxiv.org/abs/2103.14581 - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom **12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation** - 作者单位: 北京理工大学, 美团 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_Embedded_Discriminative_Attention_Mechanism_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://github.com/allenwu97/EDAM **13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation** - 作者单位: 首尔大学 - Paper: https://arxiv.org/abs/2103.08907 - Code: https://github.com/jbeomlee93/BBAM ## 半监督语义分割 **14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision** - 作者单位: 北京大学, 微软亚洲研究院 - Paper: https://arxiv.org/abs/2106.01226 - Code: https://github.com/charlesCXK/TorchSemiSeg **15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation** - 作者单位: 华为, 大连理工大学, 北京大学 - Paper: https://arxiv.org/abs/2103.04705 - Code: None **16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency** - 作者单位: 香港中文大学, 思谋科技, 牛津大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lai_Semi-Supervised_Semantic_Segmentation_With_Directional_Context-Aware_Consistency_CVPR_2021_paper.html - Code: None **17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization** - 作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Semantic_Segmentation_With_Generative_Models_Semi-Supervised_Learning_and_Strong_Out-of-Domain_CVPR_2021_paper.html - Code: https://nv-tlabs.github.io/semanticGAN/ **18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation** - 作者单位: ETH Zurich, 伯恩大学, 鲁汶大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hoyer_Three_Ways_To_Improve_Semantic_Segmentation_With_Self-Supervised_Depth_Estimation_CVPR_2021_paper.html - Code: https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth ## 域自适应语义分割 **19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation** - 作者单位: ETH Zurich, 鲁汶大学, 电子科技大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html - Code: None **20. Source-Free Domain Adaptation for Semantic Segmentation** - 作者单位: 华东师范大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Source-Free_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2021_paper.html - Code: None **21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation** - 作者单位: Idiap Research Institute, EPFL, 日内瓦大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/S_Uncertainty_Reduction_for_Model_Adaptation_in_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://git.io/JthPp **22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation** - 作者单位: 达姆施塔特工业大学, hessian.AI - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Araslanov_Self-Supervised_Augmentation_Consistency_for_Adapting_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://github.com/visinf/da-sac **23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening** - 作者单位: LG AI研究院, KAIST等 - Paper: https://arxiv.org/abs/2103.15597 - Code: https://github.com/shachoi/RobustNet **24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization** - 作者单位: 香港大学, 深睿医疗 - Paper: https://arxiv.org/abs/2103.13041 - Code: None **25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation** - 作者单位: 香港城市大学, 百度 - Paper: https://arxiv.org/abs/2103.05254 - Code: https://github.com/cyang-cityu/MetaCorrection **26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation** - 作者单位: 华为云, 华为诺亚, 大连理工大学 - Paper: https://arxiv.org/abs/2103.04717 - Code: None **27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation** - 作者单位: 中国科学技术大学, 微软亚洲研究院 - Paper: https://arxiv.org/abs/2101.10979 - Code: https://github.com/microsoft/ProDA **28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation** - 作者单位: 南卡罗来纳大学, 天远视科技 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_DANNet_A_One-Stage_Domain_Adaptation_Network_for_Unsupervised_Nighttime_Semantic_CVPR_2021_paper.html - Code: https://github.com/W-zx-Y/DANNet ## Few-Shot语义分割 **29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation** - 作者单位: MBZUAI, IIAI, 哈工大 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Xie_Scale-Aware_Graph_Neural_Network_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html - Code: None **30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation** - 作者单位: 国科大, 清华大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Anti-Aliasing_Semantic_Reconstruction_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html - Code: https://github.com/Bibkiller/ASR ## 无监督语义分割 **31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering** - 作者单位: UT-Austin, 康奈尔大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cho_PiCIE_Unsupervised_Semantic_Segmentation_Using_Invariance_and_Equivariance_in_Clustering_CVPR_2021_paper.html - Code: https:// github.com/janghyuncho/PiCIE ## 视频语义分割 **32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild** - 作者单位: 浙江大学, 百度, 悉尼科技大学 - Homepage: https://www.vspwdataset.com/ - Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf - GitHub: https://github.com/sssdddwww2/vspw_dataset_download ## 其它 **33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations** - 作者单位: 帕多瓦大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html - Code: https://lttm.dei.unipd.it/paper_data/SDR/ **34. Exploit Visual Dependency Relations for Semantic Segmentation** - 作者单位: 伊利诺伊大学芝加哥分校 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Exploit_Visual_Dependency_Relations_for_Semantic_Segmentation_CVPR_2021_paper.html - Code: None **35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs** - 作者单位: Institute for Infocomm Research, 新加坡国立大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.html - Code: None **36. PLOP: Learning without Forgetting for Continual Semantic Segmentation** - 作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai - Paper: https://arxiv.org/abs/2011.11390 - Code: https://github.com/arthurdouillard/CVPR2021_PLOP **37. 3D-to-2D Distillation for Indoor Scene Parsing** - 作者单位: 香港中文大学, 香港大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_3D-to-2D_Distillation_for_Indoor_Scene_Parsing_CVPR_2021_paper.html - Code: None **38. Bidirectional Projection Network for Cross Dimension Scene Understanding** - 作者单位: 香港中文大学, 牛津大学等 - Paper(Oral): https://arxiv.org/abs/2103.14326 - Code: https://github.com/wbhu/BPNet **39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation** - 作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html - Code: https://github.com/lxtGH/PFSegNets # 实例分割(Instance Segmentation) **DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation** - Paper: https://arxiv.org/abs/2011.09876 - Code: https://github.com/aliyun/DCT-Mask **Incremental Few-Shot Instance Segmentation** - Paper: https://arxiv.org/abs/2105.05312 - Code: https://github.com/danganea/iMTFA **A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation** - Paper: https://arxiv.org/abs/2105.03186 - Code: None **RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features** - Paper: https://arxiv.org/abs/2104.08569 - Code: https://github.com/zhanggang001/RefineMask/ **Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation** - Paper: https://arxiv.org/abs/2104.05239 - Code: https://github.com/tinyalpha/BPR **Multi-Scale Aligned Distillation for Low-Resolution Detection** - Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf - Code: https://github.com/Jia-Research-Lab/MSAD **Boundary IoU: Improving Object-Centric Image Segmentation Evaluation** - Homepage: https://bowenc0221.github.io/boundary-iou/ - Paper: https://arxiv.org/abs/2103.16562 - Code: https://github.com/bowenc0221/boundary-iou-api **Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers** - Paper: https://arxiv.org/abs/2103.12340 - Code: https://github.com/lkeab/BCNet **Zero-shot instance segmentation(Not Sure)** - Paper: None - Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395 ## 视频实例分割 **STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation** - Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm - Code: https://github.com/MinghanLi/STMask **End-to-End Video Instance Segmentation with Transformers** - Paper(Oral): https://arxiv.org/abs/2011.14503 - Code: https://github.com/Epiphqny/VisTR # 全景分割(Panoptic Segmentation) **ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation** - Paper: https://arxiv.org/abs/2012.05258 - Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab - Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab **Part-aware Panoptic Segmentation** - Paper: https://arxiv.org/abs/2106.06351 - Code: https://github.com/tue-mps/panoptic_parts - Dataset: https://github.com/tue-mps/panoptic_parts **Exemplar-Based Open-Set Panoptic Segmentation Network** - Homepage: https://cv.snu.ac.kr/research/EOPSN/ - Paper: https://arxiv.org/abs/2105.08336 - Code: https://github.com/jd730/EOPSN **MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers** - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html - Code: None **Panoptic Segmentation Forecasting** - Paper: https://arxiv.org/abs/2104.03962 - Code: https://github.com/nianticlabs/panoptic-forecasting **Fully Convolutional Networks for Panoptic Segmentation** - Paper: https://arxiv.org/abs/2012.00720 - Code: https://github.com/yanwei-li/PanopticFCN **Cross-View Regularization for Domain Adaptive Panoptic Segmentation** - Paper: https://arxiv.org/abs/2103.02584 - Code: None # 医学图像分割 **1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling** - 作者单位: 腾讯天衍实验室, 北京同仁医院 - Paper(Best Paper Candidate): https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Learning_Calibrated_Medical_Image_Segmentation_via_Multi-Rater_Agreement_Modeling_CVPR_2021_paper.html - Code: https://github.com/jiwei0921/MRNet/ **2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation** - 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Reiss_Every_Annotation_Counts_Multi-Label_Deep_Supervision_for_Medical_Image_Segmentation_CVPR_2021_paper.html - Code: None **3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space** - 作者单位: 香港中文大学, 香港理工大学 - Paper: https://arxiv.org/abs/2103.06030 - Code: https://github.com/liuquande/FedDG-ELCFS **4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation** - 作者单位: 约翰斯·霍普金斯大大学, NVIDIA - Paper(Oral): https://arxiv.org/abs/2103.15954 - Code: None **5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images** - 作者单位: 斯坦福大学 - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html - Code: None # 视频目标分割(Video-Object-Segmentation) **Learning Position and Target Consistency for Memory-based Video Object Segmentation** - Paper: https://arxiv.org/abs/2104.04329 - Code: None **SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation** - Paper(Oral): https://arxiv.org/abs/2101.08833 - Code: https://github.com/dukebw/SSTVOS # 交互式视频目标分割(Interactive-Video-Object-Segmentation) **Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion** - Homepage: https://hkchengrex.github.io/MiVOS/ - Paper: https://arxiv.org/abs/2103.07941 - Code: https://github.com/hkchengrex/MiVOS - Demo: https://hkchengrex.github.io/MiVOS/video.html#partb **Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild** - Paper: https://arxiv.org/abs/2103.10391 - Code: https://github.com/svip-lab/IVOS-W # 显著性检测(Saliency Detection) **Uncertainty-aware Joint Salient Object and Camouflaged Object Detection** - Paper: https://arxiv.org/abs/2104.02628 - Code: https://github.com/JingZhang617/Joint_COD_SOD **Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion** - Paper(Oral): https://arxiv.org/abs/2103.11832 - Code: https://github.com/sunpeng1996/DSA2F # 伪装物体检测(Camouflaged Object Detection) **Uncertainty-aware Joint Salient Object and Camouflaged Object Detection** - Paper: https://arxiv.org/abs/2104.02628 - Code: https://github.com/JingZhang617/Joint_COD_SOD # 协同显著性检测(Co-Salient Object Detection) **Group Collaborative Learning for Co-Salient Object Detection** - Paper: https://arxiv.org/abs/2104.01108 - Code: https://github.com/fanq15/GCoNet # 协同显著性检测(Image Matting) **Semantic Image Matting** - Paper: https://arxiv.org/abs/2104.08201 - Code: https://github.com/nowsyn/SIM - Dataset: https://github.com/nowsyn/SIM # 行人重识别(Person Re-identification) **Generalizable Person Re-identification with Relevance-aware Mixture of Experts** - Paper: https://arxiv.org/abs/2105.09156 - Code: None **Unsupervised Multi-Source Domain Adaptation for Person Re-Identification** - Paper: https://arxiv.org/abs/2104.12961 - Code: None **Combined Depth Space based Architecture Search For Person Re-identification** - Paper: https://arxiv.org/abs/2104.04163 - Code: None # 行人搜索(Person Search) **Anchor-Free Person Search** - Paper: https://arxiv.org/abs/2103.11617 - Code: https://github.com/daodaofr/AlignPS - Interpretation: [首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021](https://mp.weixin.qq.com/s/iqJkgp0JBanmeBPyHUkb-A) # 视频理解/行为识别(Video Understanding) **Temporal-Relational CrossTransformers for Few-Shot Action Recognition** - Paper: https://arxiv.org/abs/2101.06184 - Code: https://github.com/tobyperrett/trx **FrameExit: Conditional Early Exiting for Efficient Video Recognition** - Paper(Oral): https://arxiv.org/abs/2104.13400 - Code: None **No frame left behind: Full Video Action Recognition** - Paper: https://arxiv.org/abs/2103.15395 - Code: None **Learning Salient Boundary Feature for Anchor-free Temporal Action Localization** - Paper: https://arxiv.org/abs/2103.13137 - Code: None **Temporal Context Aggregation Network for Temporal Action Proposal Refinement** - Paper: https://arxiv.org/abs/2103.13141 - Code: None - Interpretation: [CVPR 2021 | TCANet:最强时序动作提名修正网络](https://mp.weixin.qq.com/s/UOWMfpTljkyZznHtpkQBhA) **ACTION-Net: Multipath Excitation for Action Recognition** - Paper: https://arxiv.org/abs/2103.07372 - Code: https://github.com/V-Sense/ACTION-Net **Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning** - Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html - Paper: https://arxiv.org/abs/2009.05769 - Code: https://github.com/FingerRec/BE **TDN: Temporal Difference Networks for Efficient Action Recognition** - Paper: https://arxiv.org/abs/2012.10071 - Code: https://github.com/MCG-NJU/TDN # 人脸识别(Face Recognition) **A 3D GAN for Improved Large-pose Facial Recognition** - Paper: https://arxiv.org/abs/2012.10545 - Code: None **MagFace: A Universal Representation for Face Recognition and Quality Assessment** - Paper(Oral): https://arxiv.org/abs/2103.06627 - Code: https://github.com/IrvingMeng/MagFace **WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition** - Homepage: https://www.face-benchmark.org/ - Paper: https://arxiv.org/abs/2103.04098 - Dataset: https://www.face-benchmark.org/ **When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework** - Paper(Oral): https://arxiv.org/abs/2103.01520 - Code: https://github.com/Hzzone/MTLFace - Dataset: https://github.com/Hzzone/MTLFace # 人脸检测(Face Detection) **HLA-Face: Joint High-Low Adaptation for Low Light Face Detection** - Homepage: https://daooshee.github.io/HLA-Face-Website/ - Paper: https://arxiv.org/abs/2104.01984 - Code: https://github.com/daooshee/HLA-Face-Code **CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement** - Paper: https://arxiv.org/abs/2103.07017 - Code: None # 人脸活体检测(Face Anti-Spoofing) **Cross Modal Focal Loss for RGBD Face Anti-Spoofing** - Paper: https://arxiv.org/abs/2103.00948 - Code: None # Deepfake检测(Deepfake Detection) **Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain** - Paper:https://arxiv.org/abs/2103.01856 - Code: None **Multi-attentional Deepfake Detection** - Paper:https://arxiv.org/abs/2103.02406 - Code: None # 人脸年龄估计(Age Estimation) **Continuous Face Aging via Self-estimated Residual Age Embedding** - Paper: https://arxiv.org/abs/2105.00020 - Code: None **PML: Progressive Margin Loss for Long-tailed Age Classification** - Paper: https://arxiv.org/abs/2103.02140 - Code: None # 人脸表情识别(Facial Expression Recognition) **Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition** - Paper: https://arxiv.org/abs/2103.13372 - Code: None # Deepfakes **MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes** - Paper: https://arxiv.org/abs/2103.14211 - Code: None # 人体解析(Human Parsing) **Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing** - Paper: https://arxiv.org/abs/2103.04570 - Code: https://github.com/tfzhou/MG-HumanParsing # 2D/3D人体姿态估计(2D/3D Human Pose Estimation) ## 2D 人体姿态估计 **ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search** - Paper: ttps://arxiv.org/abs/2105.10154 - Code: None **When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks** - Paper: https://arxiv.org/abs/2105.06152 - Code: None **Pose Recognition with Cascade Transformers** - Paper: https://arxiv.org/abs/2104.06976 - Code: https://github.com/mlpc-ucsd/PRTR **DCPose: Deep Dual Consecutive Network for Human Pose Estimation** - Paper: https://arxiv.org/abs/2103.07254 - Code: https://github.com/Pose-Group/DCPose ## 3D 人体姿态估计 **End-to-End Human Pose and Mesh Reconstruction with Transformers** - Paper: https://arxiv.org/abs/2012.09760 - Code: https://github.com/microsoft/MeshTransformer **PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation** - Paper(Oral): https://arxiv.org/abs/2105.02465 - Code: https://github.com/jfzhang95/PoseAug **Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration** - Paper: https://arxiv.org/abs/2103.02845 - Code: https://github.com/SeanChenxy/HandMesh **Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks** - Paper: https://arxiv.org/abs/2104.01797 - https://github.com/3dpose/3D-Multi-Person-Pose **HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation** - Homepage: https://jeffli.site/HybrIK/ - Paper: https://arxiv.org/abs/2011.14672 - Code: https://github.com/Jeff-sjtu/HybrIK # 动物姿态估计(Animal Pose Estimation) **From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation** - Paper: https://arxiv.org/abs/2103.14843 - Code: None # 手部姿态估计(Hand Pose Estimation) **Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time** - Homepage: https://stevenlsw.github.io/Semi-Hand-Object/ - Paper: https://arxiv.org/abs/2106.05266 - Code: https://github.com/stevenlsw/Semi-Hand-Object # Human Volumetric Capture **POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture** - Homepage: http://www.liuyebin.com/posefusion/posefusion.html - Paper(Oral): https://arxiv.org/abs/2103.15331 - Code: None # 场景文本检测(Scene Text Detection) **Fourier Contour Embedding for Arbitrary-Shaped Text Detection** - Paper: https://arxiv.org/abs/2104.10442 - Code: None # 场景文本识别(Scene Text Recognition) **Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition** - Paper: https://arxiv.org/abs/2103.06495 - Code: https://github.com/FangShancheng/ABINet # 图像压缩 **Checkerboard Context Model for Efficient Learned Image Compression** - Paper: https://arxiv.org/abs/2103.15306 - Code: None **Slimmable Compressive Autoencoders for Practical Neural Image Compression** - Paper: https://arxiv.org/abs/2103.15726 - Code: None **Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton** - Paper: https://arxiv.org/abs/2103.15368 - Code: None # 模型压缩/剪枝/量化 **Teachers Do More Than Teach: Compressing Image-to-Image Models** - Paper: https://arxiv.org/abs/2103.03467 - Code: https://github.com/snap-research/CAT ## 模型剪枝 **Dynamic Slimmable Network** - Paper: https://arxiv.org/abs/2103.13258 - Code: https://github.com/changlin31/DS-Net ## 模型量化 **Network Quantization with Element-wise Gradient Scaling** - Paper: https://arxiv.org/abs/2104.00903 - Code: None **Zero-shot Adversarial Quantization** - Paper(Oral): https://arxiv.org/abs/2103.15263 - Code: https://git.io/Jqc0y **Learnable Companding Quantization for Accurate Low-bit Neural Networks** - Paper: https://arxiv.org/abs/2103.07156 - Code: None # 知识蒸馏(Knowledge Distillation) **Distilling Knowledge via Knowledge Review** - Paper: https://arxiv.org/abs/2104.09044 - Code: https://github.com/Jia-Research-Lab/ReviewKD **Distilling Object Detectors via Decoupled Features** - Paper: https://arxiv.org/abs/2103.14475 - Code: https://github.com/ggjy/DeFeat.pytorch # 超分辨率(Super-Resolution) **Image Super-Resolution with Non-Local Sparse Attention** - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Mei_Image_Super-Resolution_With_Non-Local_Sparse_Attention_CVPR_2021_paper.pdf - Code: https://github.com/HarukiYqM/Non-Local-Sparse-Attention **Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline** - Homepage: http://mepro.bjtu.edu.cn/resource.html - Paper: https://arxiv.org/abs/2104.06174 - Code: None **ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic** - Paper: https://arxiv.org/abs/2103.04039 - Code: https://github.com/Xiangtaokong/ClassSR **AdderSR: Towards Energy Efficient Image Super-Resolution** - Paper: https://arxiv.org/abs/2009.08891 - Code: None # 去雾(Dehazing) **Contrastive Learning for Compact Single Image Dehazing** - Paper: https://arxiv.org/abs/2104.09367 - Code: https://github.com/GlassyWu/AECR-Net ## 视频超分辨率 **Temporal Modulation Network for Controllable Space-Time Video Super-Resolution** - Paper: None - Code: https://github.com/CS-GangXu/TMNet # 图像恢复(Image Restoration) **Multi-Stage Progressive Image Restoration** - Paper: https://arxiv.org/abs/2102.02808 - Code: https://github.com/swz30/MPRNet # 图像补全(Image Inpainting) **PD-GAN: Probabilistic Diverse GAN for Image Inpainting** - Paper: https://arxiv.org/abs/2105.02201 - Code: https://github.com/KumapowerLIU/PD-GAN **TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations** - Homepage: https://yzhouas.github.io/projects/TransFill/index.html - Paper: https://arxiv.org/abs/2103.15982 - Code: None # 图像编辑(Image Editing) **StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: https://arxiv.org/abs/2104.14754 - Code: https://github.com/naver-ai/StyleMapGAN - Demo Video: https://youtu.be/qCapNyRA_Ng **High-Fidelity and Arbitrary Face Editing** - Paper: https://arxiv.org/abs/2103.15814 - Code: None **Anycost GANs for Interactive Image Synthesis and Editing** - Paper: https://arxiv.org/abs/2103.03243 - Code: https://github.com/mit-han-lab/anycost-gan **PISE: Person Image Synthesis and Editing with Decoupled GAN** - Paper: https://arxiv.org/abs/2103.04023 - Code: https://github.com/Zhangjinso/PISE **DeFLOCNet: Deep Image Editing via Flexible Low-level Controls** - Paper: http://raywzy.com/ - Code: http://raywzy.com/ **Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing** - Paper: None - Code: None # 图像描述(Image Captioning) **Towards Accurate Text-based Image Captioning with Content Diversity Exploration** - Paper: https://arxiv.org/abs/2105.03236 - Code: None # 字体生成(Font Generation) **DG-Font: Deformable Generative Networks for Unsupervised Font Generation** - Paper: https://arxiv.org/abs/2104.03064 - Code: https://github.com/ecnuycxie/DG-Font # 图像匹配(Image Matcing) **LoFTR: Detector-Free Local Feature Matching with Transformers** - Homepage: https://zju3dv.github.io/loftr/ - Paper: https://arxiv.org/abs/2104.00680 - Code: https://github.com/zju3dv/LoFTR **Convolutional Hough Matching Networks** - Homapage: http://cvlab.postech.ac.kr/research/CHM/ - Paper(Oral): https://arxiv.org/abs/2103.16831 - Code: None # 图像融合(Image Blending) **Bridging the Visual Gap: Wide-Range Image Blending** - Paper: https://arxiv.org/abs/2103.15149 - Code: https://github.com/julia0607/Wide-Range-Image-Blending # 反光去除(Reflection Removal) **Robust Reflection Removal with Reflection-free Flash-only Cues** - Paper: https://arxiv.org/abs/2103.04273 - Code: https://github.com/ChenyangLEI/flash-reflection-removal # 3D点云分类(3D Point Clouds Classification) **Equivariant Point Network for 3D Point Cloud Analysis** - Paper: https://arxiv.org/abs/2103.14147 - Code: None **PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds** - Paper: https://arxiv.org/abs/2103.14635 - Code: https://github.com/CVMI-Lab/PAConv # 3D目标检测(3D Object Detection) **3D-MAN: 3D Multi-frame Attention Network for Object Detection** - Paper: https://arxiv.org/abs/2103.16054 - Code: None **Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds** - Paper: https://arxiv.org/abs/2104.06114 - Code: https://github.com/cheng052/BRNet **HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection** - Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/ - Paper: https://arxiv.org/abs/2104.00902 - Code: https://github.com/cvlab-yonsei/HVPR **LiDAR R-CNN: An Efficient and Universal 3D Object Detector** - Paper: https://arxiv.org/abs/2103.15297 - Code: https://github.com/tusimple/LiDAR_RCNN **M3DSSD: Monocular 3D Single Stage Object Detector** - Paper: https://arxiv.org/abs/2103.13164 - Code: https://github.com/mumianyuxin/M3DSSD **SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud** - Paper: None - Code: https://github.com/Vegeta2020/SE-SSD **Center-based 3D Object Detection and Tracking** - Paper: https://arxiv.org/abs/2006.11275 - Code: https://github.com/tianweiy/CenterPoint **Categorical Depth Distribution Network for Monocular 3D Object Detection** - Paper: https://arxiv.org/abs/2103.01100 - Code: None # 3D语义分割(3D Semantic Segmentation) **Bidirectional Projection Network for Cross Dimension Scene Understanding** - Paper(Oral): https://arxiv.org/abs/2103.14326 - Code: https://github.com/wbhu/BPNet **Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion** - Paper: https://arxiv.org/abs/2103.07074 - Code: https://github.com/ShiQiu0419/BAAF-Net **Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation** - Paper: https://arxiv.org/abs/2011.10033 - Code: https://github.com/xinge008/Cylinder3D **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges** - Homepage: https://github.com/QingyongHu/SensatUrban - Paper: http://arxiv.org/abs/2009.03137 - Code: https://github.com/QingyongHu/SensatUrban - Dataset: https://github.com/QingyongHu/SensatUrban # 3D全景分割(3D Panoptic Segmentation) **Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation** - Paper: https://arxiv.org/abs/2103.14962 - Code: https://github.com/edwardzhou130/Panoptic-PolarNet # 3D目标跟踪(3D Object Trancking) **Center-based 3D Object Detection and Tracking** - Paper: https://arxiv.org/abs/2006.11275 - Code: https://github.com/tianweiy/CenterPoint # 3D点云配准(3D Point Cloud Registration) **ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning** - Paper: https://arxiv.org/abs/2103.15231 - Code: None **PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency** - Paper: https://arxiv.org/abs/2103.05465 - Code: https://github.com/XuyangBai/PointDSC **PREDATOR: Registration of 3D Point Clouds with Low Overlap** - Paper: https://arxiv.org/abs/2011.13005 - Code: https://github.com/ShengyuH/OverlapPredator # 3D点云补全(3D Point Cloud Completion) **Unsupervised 3D Shape Completion through GAN Inversion** - Homepage: https://junzhezhang.github.io/projects/ShapeInversion/ - Paper: https://arxiv.org/abs/2104.13366 - Code: https://github.com/junzhezhang/shape-inversion **Variational Relational Point Completion Network** - Homepage: https://paul007pl.github.io/projects/VRCNet - Paper: https://arxiv.org/abs/2104.10154 - Code: https://github.com/paul007pl/VRCNet **Style-based Point Generator with Adversarial Rendering for Point Cloud Completion** - Homepage: https://alphapav.github.io/SpareNet/ - Paper: https://arxiv.org/abs/2103.02535 - Code: https://github.com/microsoft/SpareNet # 3D重建(3D Reconstruction) **Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection** - Paper: http://arxiv.org/abs/2106.07852 - Code: https://github.com/TencentYoutuResearch/3DFaceReconstruction-LAP **Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction** - Paper: https://arxiv.org/abs/2104.00858 - Code: None **NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video** - Homepage: https://zju3dv.github.io/neuralrecon/ - Paper(Oral): https://arxiv.org/abs/2104.00681 - Code: https://github.com/zju3dv/NeuralRecon # 6D位姿估计(6D Pose Estimation) **FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism** - Paper(Oral): https://arxiv.org/abs/2103.07054 - Code: https://github.com/DC1991/FS-Net **GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation** - Paper: http://arxiv.org/abs/2102.12145 - code: https://git.io/GDR-Net **FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation** - Paper: https://arxiv.org/abs/2103.02242 - Code: https://github.com/ethnhe/FFB6D # 相机姿态估计 **Back to the Feature: Learning Robust Camera Localization from Pixels to Pose** - Paper: https://arxiv.org/abs/2103.09213 - Code: https://github.com/cvg/pixloc # 深度估计(Depth Estimation) **S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation** - Paper(Oral): https://arxiv.org/abs/2104.00877 - Code: None **Beyond Image to Depth: Improving Depth Prediction using Echoes** - Homepage: https://krantiparida.github.io/projects/bimgdepth.html - Paper: https://arxiv.org/abs/2103.08468 - Code: https://github.com/krantiparida/beyond-image-to-depth **S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation** - Paper: https://arxiv.org/abs/2103.02396 - Code: None **Depth from Camera Motion and Object Detection** - Paper: https://arxiv.org/abs/2103.01468 - Code: https://github.com/griffbr/ODMD - Dataset: https://github.com/griffbr/ODMD # 立体匹配(Stereo Matching) **A Decomposition Model for Stereo Matching** - Paper: https://arxiv.org/abs/2104.07516 - Code: None # 光流估计(Flow Estimation) **Self-Supervised Multi-Frame Monocular Scene Flow** - Paper: https://arxiv.org/abs/2105.02216 - Code: https://github.com/visinf/multi-mono-sf **RAFT-3D: Scene Flow using Rigid-Motion Embeddings** - Paper: https://arxiv.org/abs/2012.00726v1 - Code: None **Learning Optical Flow From Still Images** - Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/ - Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf - Code: https://github.com/mattpoggi/depthstillation **FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds** - Paper: https://arxiv.org/abs/2104.00798 - Code: None # 车道线检测(Lane Detection) **Focus on Local: Detecting Lane Marker from Bottom Up via Key Point** - Paper: https://arxiv.org/abs/2105.13680 - Code: None **Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection** - Paper: https://arxiv.org/abs/2010.12035 - Code: https://github.com/lucastabelini/LaneATT # 轨迹预测(Trajectory Prediction) **Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction** - Paper(Oral): https://arxiv.org/abs/2104.08277 - Code: None # 人群计数(Crowd Counting) **Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark** - Paper: https://arxiv.org/abs/2105.02440 - Code: https://github.com/VisDrone/DroneCrowd - Dataset: https://github.com/VisDrone/DroneCrowd # 对抗样本(Adversarial Examples) **Enhancing the Transferability of Adversarial Attacks through Variance Tuning** - Paper: https://arxiv.org/abs/2103.15571 - Code: https://github.com/JHL-HUST/VT **LiBRe: A Practical Bayesian Approach to Adversarial Detection** - Paper: https://arxiv.org/abs/2103.14835 - Code: None **Natural Adversarial Examples** - Paper: https://arxiv.org/abs/1907.07174 - Code: https://github.com/hendrycks/natural-adv-examples # 图像检索(Image Retrieval) **StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval** - Paper: https://arxiv.org/abs/2103.15706 - COde: None **QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval** - Paper: https://arxiv.org/abs/2103.02927 - Code: None # 视频检索(Video Retrieval) **On Semantic Similarity in Video Retrieval** - Paper: https://arxiv.org/abs/2103.10095 - Homepage: https://mwray.github.io/SSVR/ - Code: https://github.com/mwray/Semantic-Video-Retrieval # 跨模态检索(Cross-modal Retrieval) **Cross-Modal Center Loss for 3D Cross-Modal Retrieval** - Paper: https://arxiv.org/abs/2008.03561 - Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss **Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers** - Paper: https://arxiv.org/abs/2103.16553 - Code: None **Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning** - Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning - Code: https://github.com/amzn/image-to-recipe-transformers # Zero-Shot Learning **Counterfactual Zero-Shot and Open-Set Visual Recognition** - Paper: https://arxiv.org/abs/2103.00887 - Code: https://github.com/yue-zhongqi/gcm-cf # 联邦学习(Federated Learning) **FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space** - Paper: https://arxiv.org/abs/2103.06030 - Code: https://github.com/liuquande/FedDG-ELCFS # 视频插帧(Video Frame Interpolation) **CDFI: Compression-Driven Network Design for Frame Interpolation** - Paper: None - Code: https://github.com/tding1/CDFI **FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation** - Homepage: https://tarun005.github.io/FLAVR/ - Paper: https://arxiv.org/abs/2012.08512 - Code: https://github.com/tarun005/FLAVR # 视觉推理(Visual Reasoning) **Transformation Driven Visual Reasoning** - homepage: https://hongxin2019.github.io/TVR/ - Paper: https://arxiv.org/abs/2011.13160 - Code: https://github.com/hughplay/TVR # 图像合成(Image Synthesis) **GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields** - Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html - Paper(Oral): https://arxiv.org/abs/2011.12100 - Code: https://github.com/autonomousvision/giraffe - Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1 **Taming Transformers for High-Resolution Image Synthesis** - Homepage: https://compvis.github.io/taming-transformers/ - Paper(Oral): https://arxiv.org/abs/2012.09841 - Code: https://github.com/CompVis/taming-transformers # 视图合成(View Synthesis) **Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes** - Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/ - Paper: https://arxiv.org/abs/2104.06935 **Self-Supervised Visibility Learning for Novel View Synthesis** - Paper: https://arxiv.org/abs/2103.15407 - Code: None **NeX: Real-time View Synthesis with Neural Basis Expansion** - Homepage: https://nex-mpi.github.io/ - Paper(Oral): https://arxiv.org/abs/2103.05606 # 风格迁移(Style Transfer) **Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer** - Paper: https://arxiv.org/abs/2104.05376 - Code: https://github.com/PaddlePaddle/PaddleGAN/ # 布局生成(Layout Generation) **LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity** - Paper: None - Code: None **Variational Transformer Networks for Layout Generation** - Paper: https://arxiv.org/abs/2104.02416 - Code: None # Domain Generalization **Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections** - Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/papers/Pandey_Generalization_on_Unseen_Domains_via_Inference-Time_Label-Preserving_Target_Projections_CVPR_2021_paper.pdf - Code: https://github.com/VSumanth99/InferenceTimeDG **Generalizable Person Re-identification with Relevance-aware Mixture of Experts** - Paper: https://arxiv.org/abs/2105.09156 - Code: None **RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening** - Paper: https://arxiv.org/abs/2103.15597 - Code: https://github.com/shachoi/RobustNet **Adaptive Methods for Real-World Domain Generalization** - Paper: https://arxiv.org/abs/2103.15796 - Code: None **FSDR: Frequency Space Domain Randomization for Domain Generalization** - Paper: https://arxiv.org/abs/2103.02370 - Code: None # Domain Adaptation **Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation** - Paper: https://arxiv.org/abs/2104.00808 - Code: None **Domain Consensus Clustering for Universal Domain Adaptation** - Paper: http://reler.net/papers/guangrui_cvpr2021.pdf - Code: https://github.com/Solacex/Domain-Consensus-Clustering # Open-Set **Towards Open World Object Detection** - Paper(Oral): https://arxiv.org/abs/2103.02603 - Code: https://github.com/JosephKJ/OWOD **Exemplar-Based Open-Set Panoptic Segmentation Network** - Homepage: https://cv.snu.ac.kr/research/EOPSN/ - Paper: https://arxiv.org/abs/2105.08336 - Code: https://github.com/jd730/EOPSN **Learning Placeholders for Open-Set Recognition** - Paper(Oral): https://arxiv.org/abs/2103.15086 - Code: None # Adversarial Attack **IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking** - Paper: https://arxiv.org/abs/2103.14938 - Code: https://github.com/VISION-SJTU/IoUattack # "人-物"交互(HOI)检测 **HOTR: End-to-End Human-Object Interaction Detection with Transformers** - Paper: https://arxiv.org/abs/2104.13682 - Code: None **Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information** - Paper: https://arxiv.org/abs/2103.05399 - Code: https://github.com/hitachi-rd-cv/qpic **Reformulating HOI Detection as Adaptive Set Prediction** - Paper: https://arxiv.org/abs/2103.05983 - Code: https://github.com/yoyomimi/AS-Net **Detecting Human-Object Interaction via Fabricated Compositional Learning** - Paper: https://arxiv.org/abs/2103.08214 - Code: https://github.com/zhihou7/FCL **End-to-End Human Object Interaction Detection with HOI Transformer** - Paper: https://arxiv.org/abs/2103.04503 - Code: https://github.com/bbepoch/HoiTransformer # 阴影去除(Shadow Removal) **Auto-Exposure Fusion for Single-Image Shadow Removal** - Paper: https://arxiv.org/abs/2103.01255 - Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal # 虚拟换衣(Virtual Try-On) **Parser-Free Virtual Try-on via Distilling Appearance Flows** **基于外观流蒸馏的无需人体解析的虚拟换装** - Paper: https://arxiv.org/abs/2103.04559 - Code: https://github.com/geyuying/PF-AFN # 标签噪声(Label Noise) **A Second-Order Approach to Learning with Instance-Dependent Label Noise** - Paper(Oral): https://arxiv.org/abs/2012.11854 - Code: https://github.com/UCSC-REAL/CAL # 视频稳像(Video Stabilization) **Real-Time Selfie Video Stabilization** - Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf - Code: https://github.com/jiy173/selfievideostabilization # 数据集(Datasets) **Tracking Pedestrian Heads in Dense Crowd** - Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html - Code1: https://github.com/Sentient07/HeadHunter - Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T - Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/ **Part-aware Panoptic Segmentation** - Paper: https://arxiv.org/abs/2106.06351 - Code: https://github.com/tue-mps/panoptic_parts - Dataset: https://github.com/tue-mps/panoptic_parts **Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos** - Homepage: https://www.yasamin.page/hdnet_tiktok - Paper(Oral): https://arxiv.org/abs/2103.03319 - Code: https://github.com/yasaminjafarian/HDNet_TikTok - Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v **High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network** - Paper: https://arxiv.org/abs/2105.09188 - Code: https://github.com/csjliang/LPTN - Dataset: https://github.com/csjliang/LPTN **Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark** - Paper: https://arxiv.org/abs/2105.02440 - Code: https://github.com/VisDrone/DroneCrowd - Dataset: https://github.com/VisDrone/DroneCrowd **Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets** - Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/ - Paper(Oral): https://arxiv.org/abs/2104.12690 - Code: https://github.com/fidler-lab/efficient-annotation-cookbook 论文下载链接: **ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation** - Paper: https://arxiv.org/abs/2012.05258 - Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab - Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab **Learning To Count Everything** - Paper: https://arxiv.org/abs/2104.08391 - Code: https://github.com/cvlab-stonybrook/LearningToCountEverything - Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything **Semantic Image Matting** - Paper: https://arxiv.org/abs/2104.08201 - Code: https://github.com/nowsyn/SIM - Dataset: https://github.com/nowsyn/SIM **Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline** - Homepage: http://mepro.bjtu.edu.cn/resource.html - Paper: https://arxiv.org/abs/2104.06174 - Code: None **Visual Semantic Role Labeling for Video Understanding** - Homepage: https://vidsitu.org/ - Paper: https://arxiv.org/abs/2104.00990 - Code: https://github.com/TheShadow29/VidSitu - Dataset: https://github.com/TheShadow29/VidSitu **VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild** - Homepage: https://www.vspwdataset.com/ - Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf - GitHub: https://github.com/sssdddwww2/vspw_dataset_download **Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark** - Homepage: https://vap.aau.dk/sewer-ml/ - Paper: https://arxiv.org/abs/2103.10619 **Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark** - Homepage: https://vap.aau.dk/sewer-ml/ - Paper: https://arxiv.org/abs/2103.10895 **Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food** - Paper: https://arxiv.org/abs/2103.03375 - Dataset: None **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges** - Homepage: https://github.com/QingyongHu/SensatUrban - Paper: http://arxiv.org/abs/2009.03137 - Code: https://github.com/QingyongHu/SensatUrban - Dataset: https://github.com/QingyongHu/SensatUrban **When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework** - Paper(Oral): https://arxiv.org/abs/2103.01520 - Code: https://github.com/Hzzone/MTLFace - Dataset: https://github.com/Hzzone/MTLFace **Depth from Camera Motion and Object Detection** - Paper: https://arxiv.org/abs/2103.01468 - Code: https://github.com/griffbr/ODMD - Dataset: https://github.com/griffbr/ODMD **There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge** - Homepage: http://rl.uni-freiburg.de/research/multimodal-distill - Paper: https://arxiv.org/abs/2103.01353 - Code: http://rl.uni-freiburg.de/research/multimodal-distill **Scan2Cap: Context-aware Dense Captioning in RGB-D Scans** - Paper: https://arxiv.org/abs/2012.02206 - Code: https://github.com/daveredrum/Scan2Cap - Dataset: https://github.com/daveredrum/ScanRefer **There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge** - Paper: https://arxiv.org/abs/2103.01353 - Code: http://rl.uni-freiburg.de/research/multimodal-distill - Dataset: http://rl.uni-freiburg.de/research/multimodal-distill # 其他(Others) **Fast and Accurate Model Scaling** - Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html - Code: https://github.com/facebookresearch/pycls **Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos** - Homepage: https://www.yasamin.page/hdnet_tiktok - Paper(Oral): https://arxiv.org/abs/2103.03319 - Code: https://github.com/yasaminjafarian/HDNet_TikTok - Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v **Omnimatte: Associating Objects and Their Effects in Video** - Homepage: https://omnimatte.github.io/ - Paper(Oral): https://arxiv.org/abs/2105.06993 - Code: https://omnimatte.github.io/#code **Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets** - Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/ - Paper(Oral): https://arxiv.org/abs/2104.12690 - Code: https://github.com/fidler-lab/efficient-annotation-cookbook **Motion Representations for Articulated Animation** - Paper: https://arxiv.org/abs/2104.11280 - Code: https://github.com/snap-research/articulated-animation **Deep Lucas-Kanade Homography for Multimodal Image Alignment** - Paper: https://arxiv.org/abs/2104.11693 - Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography **Skip-Convolutions for Efficient Video Processing** - Paper: https://arxiv.org/abs/2104.11487 - Code: None **KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control** - Homepage: http://tomasjakab.github.io/KeypointDeformer - Paper(Oral): https://arxiv.org/abs/2104.11224 - Code: https://github.com/tomasjakab/keypoint_deformer/ **Learning To Count Everything** - Paper: https://arxiv.org/abs/2104.08391 - Code: https://github.com/cvlab-stonybrook/LearningToCountEverything - Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything **SOLD2: Self-supervised Occlusion-aware Line Description and Detection** - Paper(Oral): https://arxiv.org/abs/2104.03362 - Code: https://github.com/cvg/SOLD2 **Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression** - Homepage: https://li-wanhua.github.io/POEs/ - Paper: https://arxiv.org/abs/2103.13629 - Code: https://github.com/Li-Wanhua/POEs **LEAP: Learning Articulated Occupancy of People** - Paper: https://arxiv.org/abs/2104.06849 - Code: None **Visual Semantic Role Labeling for Video Understanding** - Homepage: https://vidsitu.org/ - Paper: https://arxiv.org/abs/2104.00990 - Code: https://github.com/TheShadow29/VidSitu - Dataset: https://github.com/TheShadow29/VidSitu **UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles** - Paper: https://arxiv.org/abs/2104.00946 - Code: https://github.com/SUTDCV/UAV-Human **Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning** - Paper(Oral): https://arxiv.org/abs/2104.00924 - Code: None **Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction** - Paper: https://arxiv.org/abs/2104.00858 - Code: None **Towards High Fidelity Face Relighting with Realistic Shadows** - Paper: https://arxiv.org/abs/2104.00825 - Code: None **BRepNet: A topological message passing system for solid models** - Paper(Oral): https://arxiv.org/abs/2104.00706 - Code: None **Visually Informed Binaural Audio Generation without Binaural Audios** - Homepage: https://sheldontsui.github.io/projects/PseudoBinaural - Paper: None - GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021 - Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc **Exploring intermediate representation for monocular vehicle pose estimation** - Paper: None - Code: https://github.com/Nicholasli1995/EgoNet **Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB** - Paper(Oral): https://arxiv.org/abs/2103.14708 - Code: None **Invertible Image Signal Processing** - Paper: https://arxiv.org/abs/2103.15061 - Code: https://github.com/yzxing87/Invertible-ISP **Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling** - Paper: https://arxiv.org/abs/2103.14858 - Code: None **SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences** - Paper: https://arxiv.org/abs/2103.14898 - Code: None **Embedding Transfer with Label Relaxation for Improved Metric Learning** - Paper: https://arxiv.org/abs/2103.14908 - Code: None **Picasso: A CUDA-based Library for Deep Learning over 3D Meshes** - Paper: https://arxiv.org/abs/2103.15076 - Code: https://github.com/hlei-ziyan/Picasso **Meta-Mining Discriminative Samples for Kinship Verification** - Paper: https://arxiv.org/abs/2103.15108 - Code: None **Cloud2Curve: Generation and Vectorization of Parametric Sketches** - Paper: https://arxiv.org/abs/2103.15536 - Code: None **TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events** - Paper: https://arxiv.org/abs/2103.15538 - Code: https://github.com/SUTDCV/SUTD-TrafficQA **Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution** - Homepage: http://wellyzhang.github.io/project/prae.html - Paper: https://arxiv.org/abs/2103.14230 - Code: None **ACRE: Abstract Causal REasoning Beyond Covariation** - Homepage: http://wellyzhang.github.io/project/acre.html - Paper: https://arxiv.org/abs/2103.14232 - Code: None **Confluent Vessel Trees with Accurate Bifurcations** - Paper: https://arxiv.org/abs/2103.14268 - Code: None **Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling** - Paper: https://arxiv.org/abs/2103.14338 - Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer **Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks** - Homepage: https://paschalidoud.github.io/neural_parts - Paper: None - Code: https://github.com/paschalidoud/neural_parts **Knowledge Evolution in Neural Networks** - Paper(Oral): https://arxiv.org/abs/2103.05152 - Code: https://github.com/ahmdtaha/knowledge_evolution **Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning** - Paper: https://arxiv.org/abs/2103.02148 - Code: https://github.com/guopengf/FLMRCM **SGP: Self-supervised Geometric Perception** - Oral - Paper: https://arxiv.org/abs/2103.03114 - Code: https://github.com/theNded/SGP **Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning** - Paper: https://arxiv.org/abs/2103.02148 - Code: https://github.com/guopengf/FLMRCM **Diffusion Probabilistic Models for 3D Point Cloud Generation** - Paper: https://arxiv.org/abs/2103.01458 - Code: https://github.com/luost26/diffusion-point-cloud **Scan2Cap: Context-aware Dense Captioning in RGB-D Scans** - Paper: https://arxiv.org/abs/2012.02206 - Code: https://github.com/daveredrum/Scan2Cap - Dataset: https://github.com/daveredrum/ScanRefer **There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge** - Paper: https://arxiv.org/abs/2103.01353 - Code: http://rl.uni-freiburg.de/research/multimodal-distill - Dataset: http://rl.uni-freiburg.de/research/multimodal-distill # 待添加(TODO) - [重磅!腾讯优图20篇论文入选CVPR 2021](https://mp.weixin.qq.com/s/McAtOVh0osWZ3uppEoHC8A) - [MePro团队三篇论文被CVPR 2021接收](https://mp.weixin.qq.com/s/GD5Zb6u_MQ8GZIAGeCGo3Q) # 不确定中没中(Not Sure) **CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models** - Paper: none - Code: https://github.com/transcendentsky/Film-Recovery **Toward Explainable Reflection Removal with Distilling and Model Uncertainty** - Paper: none - Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty **DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation** - Paper: none - Code: https://github.com/lhaippp/DeepOIS **Exploring Adversarial Fake Images on Face Manifold** - Paper: none - Code: https://github.com/ldz666666/Style-atk **Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task** - Paper: none - Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task **Temporal Contrastive Graph for Self-supervised Video Representation Learning** - Paper: none - Code: https://github.com/YangLiu9208/TCG **Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching** - Paper: none - Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr **Fast and Memory-Efficient Compact Bilinear Pooling** - Paper: none - Code: https://github.com/cvpr2021kp2/cvpr2021kp2 **Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine** - Paper: none - Code: https://github.com/gapDetection/cvpr2021 **Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation** - Paper: none - Code: https://github.com/interactivekeypoint2020/Morph https://github.com/ShaoQiangShen/CVPR2021 https://github.com/gillesflash/CVPR2021 https://github.com/anonymous-submission1991/BaLeNAS https://github.com/cvpr2021dcb/cvpr2021dcb https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578 https://github.com/AldrichZeng/FreqPrune https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM https://github.com/ddfss/datadrive-fss ================================================ FILE: CVPR2022-Papers-with-Code.md ================================================ # CVPR 2022 论文和开源项目合集(Papers with Code) [CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)! CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view > 注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2019](CVPR2019-Papers-with-Code.md) > - [CVPR 2020](CVPR2020-Papers-with-Code.md) > - [CVPR 2021](CVPR2021-Papers-with-Code.md) 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~ ![](CVer学术交流群.png) ## 【CVPR 2022 论文开源目录】 - [Backbone](#Backbone) - [CLIP](#CLIP) - [GAN](#GAN) - [GNN](#GNN) - [MLP](#MLP) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [3D Face](#3D Face) - [长尾分布(Long-Tail)](#Long-Tail) - [Visual Transformer](#Visual-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [知识蒸馏(Knowledge Distillation)](#KD) - [目标检测(Object Detection)](#Object-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [小样本分类(Few-Shot Classification)](#FFC) - [小样本分割(Few-Shot Segmentation)](#FFS) - [图像抠图(Image Matting)](#Matting) - [视频理解(Video Understanding)](#VU) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#Super-Resolution) - [去模糊(Deblur)](#Deblur) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3D-Object-Detection) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D重建(3D Reconstruction)](#3D-R) - [行人重识别(Person Re-identification)](#ReID) - [伪装物体检测(Camouflaged Object Detection)](#COD) - [深度估计(Depth Estimation)](#Depth-Estimation) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#FM) - [车道线检测(Lane Detection)](#Lane-Detection) - [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation) - [图像修复(Image Inpainting)](#Image-Inpainting) - [图像检索(Image Retrieval)](#Image-Retrieval) - [人脸识别(Face Recognition)](#Face-Recognition) - [人群计数(Crowd Counting)](#Crowd-Counting) - [医学图像(Medical Image)](#Medical-Image) - [视频生成(Video Generation)](#Video Generation) - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation) - [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS) - [步态识别(Gait Recognition)](#GR) - [风格迁移(Style Transfer)](#ST) - [异常检测(Anomaly Detection](#AD) - [对抗样本(Adversarial Examples)](#AE) - [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL) - [雷达目标检测(Radar Object Detection)](#ROD) - [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI) - [图像拼接(Image Stitching)](#Image-Stitching) - [水印(Watermarking)](#Watermarking) - [Action Counting](#AC) - [Grounded Situation Recognition](#GSR) - [Zero-shot Learning](#ZSL) - [DeepFakes](#DeepFakes) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) # Backbone **A ConvNet for the 2020s** - Paper: https://arxiv.org/abs/2201.03545 - Code: https://github.com/facebookresearch/ConvNeXt - 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw **Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs** - Paper: https://arxiv.org/abs/2203.06717 - Code: https://github.com/megvii-research/RepLKNet - Code2: https://github.com/DingXiaoH/RepLKNet-pytorch - 中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg **MPViT : Multi-Path Vision Transformer for Dense Prediction** - Paper: https://arxiv.org/abs/2112.11010 - Code: https://github.com/youngwanLEE/MPViT - 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg **Mobile-Former: Bridging MobileNet and Transformer** - Paper: https://arxiv.org/abs/2108.05895 - Code: None - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ **MetaFormer is Actually What You Need for Vision** - Paper: https://arxiv.org/abs/2111.11418 - Code: https://github.com/sail-sg/poolformer **Shunted Self-Attention via Multi-Scale Token Aggregation** - Paper(Oral): https://arxiv.org/abs/2111.15193 - Code: https://github.com/OliverRensu/Shunted-Transformer **TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing** - Paper: http://arxiv.org/abs/2203.10489 - Code: https://github.com/JierunChen/TVConv **Learned Queries for Efficient Local Attention** - Paper(Oral): https://arxiv.org/abs/2112.11435 - Code: https://github.com/moabarar/qna **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality** - Paper: https://arxiv.org/abs/2112.11081 - Code: https://github.com/DingXiaoH/RepMLP # CLIP **HairCLIP: Design Your Hair by Text and Reference Image** - Paper: https://arxiv.org/abs/2112.05142 - Code: https://github.com/wty-ustc/HairCLIP **PointCLIP: Point Cloud Understanding by CLIP** - Paper: https://arxiv.org/abs/2112.02413 - Code: https://github.com/ZrrSkywalker/PointCLIP **Blended Diffusion for Text-driven Editing of Natural Images** - Paper: https://arxiv.org/abs/2111.14818 - Code: https://github.com/omriav/blended-diffusion # GAN **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing** - Homepage: https://semanticstylegan.github.io/ - Paper: https://arxiv.org/abs/2112.02236 - Demo: https://semanticstylegan.github.io/videos/demo.mp4 **Style Transformer for Image Inversion and Editing** - Paper: https://arxiv.org/abs/2203.07932 - Code: https://github.com/sapphire497/style-transformer **Unsupervised Image-to-Image Translation with Generative Prior** - Homepage: https://www.mmlab-ntu.com/project/gpunit/ - Paper: https://arxiv.org/abs/2204.03641 - Code: https://github.com/williamyang1991/GP-UNIT **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2** - Homepage: https://universome.github.io/stylegan-v - Paper: https://arxiv.org/abs/2112.14683 - Code: https://github.com/universome/stylegan-v **OSSGAN: Open-set Semi-supervised Image Generation** - Paper: https://arxiv.org/abs/2204.14249 - Code: https://github.com/raven38/OSSGAN **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis** - Paper: https://arxiv.org/abs/2204.06160 - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution # GNN **OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks** - Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf - Code: https://github.com/WanyuGroup/CVPR2022-OrphicX # MLP **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality** - Paper: https://arxiv.org/abs/2112.11081 - Code: https://github.com/DingXiaoH/RepMLP # NAS **β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search** - Paper: https://arxiv.org/abs/2203.01665 - Code: https://github.com/Sunshine-Ye/Beta-DARTS **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior** - Paper: https://arxiv.org/abs/2111.15362 - Code: None # OCR **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition** - Paper: https://arxiv.org/abs/2203.10209 - Code: https://github.com/mxin262/SwinTextSpotter # NeRF **Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields** - Homepage: https://jonbarron.info/mipnerf360/ - Paper: https://arxiv.org/abs/2111.12077 - Demo: https://youtu.be/YStDS2-Ln1s **Point-NeRF: Point-based Neural Radiance Fields** - Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/ - Paper: https://arxiv.org/abs/2201.08845 - Code: https://github.com/Xharlie/point-nerf **NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images** - Paper: https://arxiv.org/abs/2111.13679 - Homepage: https://bmild.github.io/rawnerf/ - Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc **Urban Radiance Fields** - Homepage: https://urban-radiance-fields.github.io/ - Paper: https://arxiv.org/abs/2111.14643 - Demo: https://youtu.be/qGlq5DZT6uc **Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation** - Paper: https://arxiv.org/abs/2202.13162 - Code: https://github.com/HexagonPrime/Pix2NeRF **HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video** - Homepage: https://grail.cs.washington.edu/projects/humannerf/ - Paper: https://arxiv.org/abs/2201.04127 - Demo: https://youtu.be/GM-RoZEymmw # 3D Face **ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations** - Paper: https://arxiv.org/abs/2203.14510 - Code: https://github.com/MingwuZheng/ImFace # 长尾分布(Long-Tail) **Retrieval Augmented Classification for Long-Tail Visual Recognition** - Paper: https://arxiv.org/abs/2202.11233 - Code: None # Visual Transformer ## Backbone **MPViT : Multi-Path Vision Transformer for Dense Prediction** - Paper: https://arxiv.org/abs/2112.11010 - Code: https://github.com/youngwanLEE/MPViT **MetaFormer is Actually What You Need for Vision** - Paper: https://arxiv.org/abs/2111.11418 - Code: https://github.com/sail-sg/poolformer **Mobile-Former: Bridging MobileNet and Transformer** - Paper: https://arxiv.org/abs/2108.05895 - Code: None - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ **Shunted Self-Attention via Multi-Scale Token Aggregation** - Paper(Oral): https://arxiv.org/abs/2111.15193 - Code: https://github.com/OliverRensu/Shunted-Transformer **Learned Queries for Efficient Local Attention** - Paper(Oral): https://arxiv.org/abs/2112.11435 - Code: https://github.com/moabarar/qna ## 应用(Application) **Language-based Video Editing via Multi-Modal Multi-Level Transformer** - Paper: https://arxiv.org/abs/2104.01122 - Code: None **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video** - Paper: https://arxiv.org/abs/2203.00859 - Code: None **Embracing Single Stride 3D Object Detector with Sparse Transformer** - Paper: https://arxiv.org/abs/2112.06375 - Code: https://github.com/TuSimple/SST - 中文解读:https://zhuanlan.zhihu.com/p/476056546 **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.02891 - Code: https://github.com/xulianuwa/MCTformer **Spatio-temporal Relation Modeling for Few-shot Action Recognition** - Paper: https://arxiv.org/abs/2112.05132 - Code: https://github.com/Anirudh257/strm **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction** - Paper: https://arxiv.org/abs/2111.07910 - Code: https://github.com/caiyuanhao1998/MST **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling** - Homepage: https://point-bert.ivg-research.xyz/ - Paper: https://arxiv.org/abs/2111.14819 - Code: https://github.com/lulutang0608/Point-BERT **GroupViT: Semantic Segmentation Emerges from Text Supervision** - Homepage: https://jerryxu.net/GroupViT/ - Paper: https://arxiv.org/abs/2202.11094 - Demo: https://youtu.be/DtJsWIUTW-Y **Restormer: Efficient Transformer for High-Resolution Image Restoration** - Paper: https://arxiv.org/abs/2111.09881 - Code: https://github.com/swz30/Restormer **Splicing ViT Features for Semantic Appearance Transfer** - Homepage: https://splice-vit.github.io/ - Paper: https://arxiv.org/abs/2201.00424 - Code: https://github.com/omerbt/Splice **Self-supervised Video Transformer** - Homepage: https://kahnchana.github.io/svt/ - Paper: https://arxiv.org/abs/2112.01514 - Code: https://github.com/kahnchana/svt **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers** - Paper: https://arxiv.org/abs/2203.02664 - Code: https://github.com/rulixiang/afa **Accelerating DETR Convergence via Semantic-Aligned Matching** - Paper: https://arxiv.org/abs/2203.06883 - Code: https://github.com/ZhangGongjie/SAM-DETR **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising** - Paper: https://arxiv.org/abs/2203.01305 - Code: https://github.com/FengLi-ust/DN-DETR - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w **Style Transformer for Image Inversion and Editing** - Paper: https://arxiv.org/abs/2203.07932 - Code: https://github.com/sapphire497/style-transformer **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer** - Paper: https://arxiv.org/abs/2203.10981 - Code: https://github.com/kuanchihhuang/MonoDTR **Mask Transfiner for High-Quality Instance Segmentation** - Paper: https://arxiv.org/abs/2111.13673 - Code: https://github.com/SysCV/transfiner **Language as Queries for Referring Video Object Segmentation** - Paper: https://arxiv.org/abs/2201.00487 - Code: https://github.com/wjn922/ReferFormer - 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning** - Paper: https://arxiv.org/abs/2203.00843 - Code: https://github.com/CurryYuan/X-Trans2Cap **AdaMixer: A Fast-Converging Query-Based Object Detector** - Paper(Oral): https://arxiv.org/abs/2203.16507 - Code: https://github.com/MCG-NJU/AdaMixer **Omni-DETR: Omni-Supervised Object Detection with Transformers** - Paper: https://arxiv.org/abs/2203.16089 - Code: https://github.com/amazon-research/omni-detr **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition** - Paper: https://arxiv.org/abs/2203.10209 - Code: https://github.com/mxin262/SwinTextSpotter **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** - Paper(Oral): https://arxiv.org/abs/2204.01018 - Code: https://github.com/SvipRepetitionCounting/TransRAC **Collaborative Transformers for Grounded Situation Recognition** - Paper: https://arxiv.org/abs/2203.16518 - Code: https://github.com/jhcho99/CoFormer **NFormer: Robust Person Re-identification with Neighbor Transformer** - Paper: https://arxiv.org/abs/2204.09331 - Code: https://github.com/haochenheheda/NFormer **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation** - Paper: https://arxiv.org/abs/2201.06889 - Code: None **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer** - Paper(Oral): https://arxiv.org/abs/2204.08680 - Code: https://github.com/zengwang430521/TCFormer **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** - Paper: https://arxiv.org/abs/2204.10039 - Code: https://github.com/H-deep/Trans-SVSR/ - Dataset: http://shorturl.at/mpwGX **Safe Self-Refinement for Transformer-based Domain Adaptation** - Paper: https://arxiv.org/abs/2204.07683 - Code: https://github.com/tsun/SSRT **Fast Point Transformer** - Homepage: http://cvlab.postech.ac.kr/research/FPT/ - Paper: https://arxiv.org/abs/2112.04702 - Code: https://github.com/POSTECH-CVLab/FastPointTransformer **Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval** - Paper: https://arxiv.org/abs/2204.09730 - Code: https://github.com/mshukor/TFood **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation** - Paper: https://arxiv.org/abs/2111.14887 - Code: https://github.com/lhoyer/DAFormer **Stratified Transformer for 3D Point Cloud Segmentation** - Paper: https://arxiv.org/pdf/2203.14508.pdf - Code: https://github.com/dvlab-research/Stratified-Transformer # 视觉和语言(Vision-Language) **Conditional Prompt Learning for Vision-Language Models** - Paper: https://arxiv.org/abs/2203.05557 - Code: https://github.com/KaiyangZhou/CoOp **Bridging Video-text Retrieval with Multiple Choice Question** - Paper: https://arxiv.org/abs/2201.04850 - Code: https://github.com/TencentARC/MCQ **Visual Abductive Reasoning** - Paper: https://arxiv.org/abs/2203.14040 - Code: https://github.com/leonnnop/VAR # 自监督学习(Self-supervised Learning) **UniVIP: A Unified Framework for Self-Supervised Visual Pre-training** - Paper: https://arxiv.org/abs/2203.06965 - Code: None **Crafting Better Contrastive Views for Siamese Representation Learning** - Paper: https://arxiv.org/abs/2202.03278 - Code: https://github.com/xyupeng/ContrastiveCrop - 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A **HCSC: Hierarchical Contrastive Selective Coding** - Homepage: https://github.com/gyfastas/HCSC - Paper: https://arxiv.org/abs/2202.00455 - 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis** - Paper: https://arxiv.org/abs/2204.10437 - Code: https://github.com/JLiangLab/DiRA # 数据增强(Data Augmentation) **TeachAugment: Data Augmentation Optimization Using Teacher Knowledge** - Paper: https://arxiv.org/abs/2202.12513 - Code: https://github.com/DensoITLab/TeachAugment **AlignMixup: Improving Representations By Interpolating Aligned Features** - Paper: https://arxiv.org/abs/2103.15375 - Code: https://github.com/shashankvkt/AlignMixup_CVPR22 # 知识蒸馏(Knowledge Distillation) **Decoupled Knowledge Distillation** - Paper: https://arxiv.org/abs/2203.08679 - Code: https://github.com/megvii-research/mdistiller - 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw # 目标检测(Object Detection) **BoxeR: Box-Attention for 2D and 3D Transformers** - Paper: https://arxiv.org/abs/2111.13087 - Code: https://github.com/kienduynguyen/BoxeR - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising** - Paper: https://arxiv.org/abs/2203.01305 - Code: https://github.com/FengLi-ust/DN-DETR - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w **Accelerating DETR Convergence via Semantic-Aligned Matching** - Paper: https://arxiv.org/abs/2203.06883 - Code: https://github.com/ZhangGongjie/SAM-DETR **Localization Distillation for Dense Object Detection** - Paper: https://arxiv.org/abs/2102.12252 - Code: https://github.com/HikariTJU/LD - Code2: https://github.com/HikariTJU/LD - 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg **Focal and Global Knowledge Distillation for Detectors** - Paper: https://arxiv.org/abs/2111.11837 - Code: https://github.com/yzd-v/FGD - 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ **A Dual Weighting Label Assignment Scheme for Object Detection** - Paper: https://arxiv.org/abs/2203.09730 - Code: https://github.com/strongwolf/DW **AdaMixer: A Fast-Converging Query-Based Object Detector** - Paper(Oral): https://arxiv.org/abs/2203.16507 - Code: https://github.com/MCG-NJU/AdaMixer **Omni-DETR: Omni-Supervised Object Detection with Transformers** - Paper: https://arxiv.org/abs/2203.16089 - Code: https://github.com/amazon-research/omni-detr **SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection** - Paper(Oral): https://arxiv.org/abs/2203.06398 - Code: https://github.com/CityU-AIM-Group/SIGMA ## 半监督目标检测 **Dense Learning based Semi-Supervised Object Detection** - Paper: https://arxiv.org/abs/2204.07300 - Code: https://github.com/chenbinghui1/DSL # 目标跟踪(Visual Tracking) **Correlation-Aware Deep Tracking** - Paper: https://arxiv.org/abs/2203.01666 - Code: None **TCTrack: Temporal Contexts for Aerial Tracking** - Paper: https://arxiv.org/abs/2203.01885 - Code: https://github.com/vision4robotics/TCTrack ## 多模态目标跟踪 **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline** - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/ - Paper: https://arxiv.org/abs/2204.04120 ## 多目标跟踪(Multi-Object Tracking) **Learning of Global Objective for Network Flow in Multi-Object Tracking** - Paper: https://arxiv.org/abs/2203.16210 - Code: None **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion** - Homepage: https://dancetrack.github.io - Paper: https://arxiv.org/abs/2111.14690 - Dataset: https://github.com/DanceTrack/DanceTrack # 语义分割(Semantic Segmentation) **Novel Class Discovery in Semantic Segmentation** - Homepage: https://ncdss.github.io/ - Paper: https://arxiv.org/abs/2112.01900 - Code: https://github.com/HeliosZhao/NCDSS **Deep Hierarchical Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.14335 - Code: https://github.com/0liliulei/HieraSeg **Rethinking Semantic Segmentation: A Prototype View** - Paper(Oral): https://arxiv.org/abs/2203.15102 - Code: https://github.com/tfzhou/ProtoSeg ## 弱监督语义分割 **Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.00962 - Code: https://github.com/zhaozhengChen/ReCAM **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.02891 - Code: https://github.com/xulianuwa/MCTformer **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers** - Paper: https://arxiv.org/abs/2203.02664 - Code: https://github.com/rulixiang/afa **CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.02668 - Code: https://github.com/CVI-SZU/CLIMS **CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.13505 - Code: https://github.com/CVI-SZU/CCAM **FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation** - Homeapage: http://cvlab.postech.ac.kr/research/FIFO/ - Paper(Oral): https://arxiv.org/abs/2204.01587 - Code: https://github.com/sohyun-l/FIFO **Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.09653 - Code: https://github.com/maeve07/RCA.git ## 半监督语义分割 **ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation** - Paper: https://arxiv.org/abs/2106.05095 - Code: https://github.com/LiheYoung/ST-PlusPlus - 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA **Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels** - Homepage: https://haochen-wang409.github.io/U2PL/ - Paper: https://arxiv.org/abs/2203.03884 - Code: https://github.com/Haochen-Wang409/U2PL - 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ **Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation** - Paper: https://arxiv.org/pdf/2111.12903.pdf - Code: https://github.com/yyliu01/PS-MT ## 域自适应语义分割 **Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation** - Paper: https://arxiv.org/abs/2111.12940 - Code: https://github.com/BIT-DA/RIPU **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation** - Paper: https://arxiv.org/abs/2111.14887 - Code: https://github.com/lhoyer/DAFormer ## 无监督语义分割 **GroupViT: Semantic Segmentation Emerges from Text Supervision** - Homepage: https://jerryxu.net/GroupViT/ - Paper: https://arxiv.org/abs/2202.11094 - Demo: https://youtu.be/DtJsWIUTW-Y ## 少样本语义分割 **Generalized Few-shot Semantic Segmentation** - Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf - Code: https://github.com/dvlab-research/GFS-Seg # 实例分割(Instance Segmentation) **BoxeR: Box-Attention for 2D and 3D Transformers** - Paper: https://arxiv.org/abs/2111.13087 - Code: https://github.com/kienduynguyen/BoxeR - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w **E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation** - Paper: https://arxiv.org/abs/2203.04074 - Code: https://github.com/zhang-tao-whu/e2ec **Mask Transfiner for High-Quality Instance Segmentation** - Paper: https://arxiv.org/abs/2111.13673 - Code: https://github.com/SysCV/transfiner **Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity** - Homepage: https://sites.google.com/view/generic-grouping/ - Paper: https://arxiv.org/abs/2204.06107 - Code: https://github.com/facebookresearch/Generic-Grouping ## 自监督实例分割 **FreeSOLO: Learning to Segment Objects without Annotations** - Paper: https://arxiv.org/abs/2202.12181 - Code: https://github.com/NVlabs/FreeSOLO ## 视频实例分割 **Efficient Video Instance Segmentation via Tracklet Query and Proposal** - Homepage: https://jialianwu.com/projects/EfficientVIS.html - Paper: https://arxiv.org/abs/2203.01853 - Demo: https://youtu.be/sSPMzgtMKCE **Temporally Efficient Vision Transformer for Video Instance Segmentation** - Paper: https://arxiv.org/abs/2204.08412 - Code: https://github.com/hustvl/TeViT # 全景分割(Panoptic Segmentation) **Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers** - Paper: https://arxiv.org/abs/2109.03814 - Code: https://github.com/zhiqi-li/Panoptic-SegFormer **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark** - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset # 小样本分类(Few-Shot Classification) **Integrative Few-Shot Learning for Classification and Segmentation** - Paper: https://arxiv.org/abs/2203.15712 - Code: https://github.com/dahyun-kang/ifsl **Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification** - Paper: https://arxiv.org/abs/2106.05517 - Code: https://github.com/LouieYang/MCL # 小样本分割(Few-Shot Segmentation) **Learning What Not to Segment: A New Perspective on Few-Shot Segmentation** - Paper: https://arxiv.org/abs/2203.07615 - Code: https://github.com/chunbolang/BAM **Integrative Few-Shot Learning for Classification and Segmentation** - Paper: https://arxiv.org/abs/2203.15712 - Code: https://github.com/dahyun-kang/ifsl **Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation** - Paper: https://arxiv.org/abs/2204.10638 - Code: None # 图像抠图(Image Matting) **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation** - Paper: https://arxiv.org/abs/2201.06889 - Code: None # 视频理解(Video Understanding) **Self-supervised Video Transformer** - Homepage: https://kahnchana.github.io/svt/ - Paper: https://arxiv.org/abs/2112.01514 - Code: https://github.com/kahnchana/svt **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** - Paper(Oral): https://arxiv.org/abs/2204.01018 - Code: https://github.com/SvipRepetitionCounting/TransRAC **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment** - Paper(Oral): https://arxiv.org/abs/2204.03646 - Dataset: https://github.com/xujinglin/FineDiving - Code: https://github.com/xujinglin/FineDiving - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg **Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition** - Paper(Oral): https://arxiv.org/abs/2204.02148 - Code: None ## 行为识别(Action Recognition) **Spatio-temporal Relation Modeling for Few-shot Action Recognition** - Paper: https://arxiv.org/abs/2112.05132 - Code: https://github.com/Anirudh257/strm ## 动作检测(Action Detection) **End-to-End Semi-Supervised Learning for Video Action Detection** - Paper: https://arxiv.org/abs/2203.04251 - Code: None # 图像编辑(Image Editing) **Style Transformer for Image Inversion and Editing** - Paper: https://arxiv.org/abs/2203.07932 - Code: https://github.com/sapphire497/style-transformer **Blended Diffusion for Text-driven Editing of Natural Images** - Paper: https://arxiv.org/abs/2111.14818 - Code: https://github.com/omriav/blended-diffusion **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing** - Homepage: https://semanticstylegan.github.io/ - Paper: https://arxiv.org/abs/2112.02236 - Demo: https://semanticstylegan.github.io/videos/demo.mp4 # Low-level Vision **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior** - Paper: https://arxiv.org/abs/2111.15362 - Code: None **Restormer: Efficient Transformer for High-Resolution Image Restoration** - Paper: https://arxiv.org/abs/2111.09881 - Code: https://github.com/swz30/Restormer **Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements** - Paper(Oral): https://arxiv.org/abs/2111.12855 - Code: https://github.com/edongdongchen/REI # 超分辨率(Super-Resolution) ## 图像超分辨率(Image Super-Resolution) **Learning the Degradation Distribution for Blind Image Super-Resolution** - Paper: https://arxiv.org/abs/2203.04962 - Code: https://github.com/greatlog/UnpairedSR ## 视频超分辨率(Video Super-Resolution) **BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment** - Paper: https://arxiv.org/abs/2104.13371 - Code: https://github.com/open-mmlab/mmediting - Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus - 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g **Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling** - Paper: https://arxiv.org/abs/2204.07114 - Code: None **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** - Paper: https://arxiv.org/abs/2204.10039 - Code: https://github.com/H-deep/Trans-SVSR/ - Dataset: http://shorturl.at/mpwGX # 去模糊(Deblur) ## 图像去模糊(Image Deblur) **Learning to Deblur using Light Field Generated and Real Defocus Images** - Homepage: http://lyruan.com/Projects/DRBNet/ - Paper(Oral): https://arxiv.org/abs/2204.00442 - Code: https://github.com/lingyanruan/DRBNet # 3D点云(3D Point Cloud) **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling** - Homepage: https://point-bert.ivg-research.xyz/ - Paper: https://arxiv.org/abs/2111.14819 - Code: https://github.com/lulutang0608/Point-BERT **A Unified Query-based Paradigm for Point Cloud Understanding** - Paper: https://arxiv.org/abs/2203.01252 - Code: None **CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding** - Paper: https://arxiv.org/abs/2203.00680 - Code: https://github.com/MohamedAfham/CrossPoint **PointCLIP: Point Cloud Understanding by CLIP** - Paper: https://arxiv.org/abs/2112.02413 - Code: https://github.com/ZrrSkywalker/PointCLIP **Fast Point Transformer** - Homepage: http://cvlab.postech.ac.kr/research/FPT/ - Paper: https://arxiv.org/abs/2112.04702 - Code: https://github.com/POSTECH-CVLab/FastPointTransformer **RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds** - Paper: https://arxiv.org/abs/2205.11028 - Code: https://github.com/gxd1994/RCP **The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution** - Paper: https://arxiv.org/abs/2205.15210 - Code: https://github.com/GostInShell/PaRI-Conv # 3D目标检测(3D Object Detection) **Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds** - Paper(Oral): https://arxiv.org/abs/2203.11139 - Code: https://github.com/yifanzhang713/IA-SSD - Demo: https://www.youtube.com/watch?v=3jP2o9KXunA **BoxeR: Box-Attention for 2D and 3D Transformers** - Paper: https://arxiv.org/abs/2111.13087 - Code: https://github.com/kienduynguyen/BoxeR - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w **Embracing Single Stride 3D Object Detector with Sparse Transformer** - Paper: https://arxiv.org/abs/2112.06375 - Code: https://github.com/TuSimple/SST **Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes** - Paper: https://arxiv.org/abs/2011.12001 - Code: https://github.com/qq456cvb/CanonicalVoting **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer** - Paper: https://arxiv.org/abs/2203.10981 - Code: https://github.com/kuanchihhuang/MonoDTR **HyperDet3D: Learning a Scene-conditioned 3D Object Detector** - Paper: https://arxiv.org/abs/2204.05599 - Code: None **OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data** - Paper: https://arxiv.org/abs/2204.06577 - Code: https://github.com/dschinagl/occam **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection** - Homepage: https://thudair.baai.ac.cn/index - Paper: https://arxiv.org/abs/2204.05575 - Code: https://github.com/AIR-THU/DAIR-V2X **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions** - Homepage: https://ithaca365.mae.cornell.edu/ - Paper: https://arxiv.org/abs/2208.01166 # 3D语义分割(3D Semantic Segmentation) **Scribble-Supervised LiDAR Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.08537 - Dataset: https://github.com/ouenal/scribblekitti **Stratified Transformer for 3D Point Cloud Segmentation** - Paper: https://arxiv.org/pdf/2203.14508.pdf - Code: https://github.com/dvlab-research/Stratified-Transformer # 3D实例分割(3D Instance Segmentation) **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions** - Homepage: https://ithaca365.mae.cornell.edu/ - Paper: https://arxiv.org/abs/2208.01166 # 3D目标跟踪(3D Object Tracking) **Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds** - Paper: https://arxiv.org/abs/2203.01730 - Code: https://github.com/Ghostish/Open3DSOT **PTTR: Relational 3D Point Cloud Object Tracking with Transformer** - Paper: https://arxiv.org/abs/2112.02857 - Code: https://github.com/Jasonkks/PTTR # 3D人体姿态估计(3D Human Pose Estimation) **MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation** - Paper: https://arxiv.org/abs/2111.12707 - Code: https://github.com/Vegetebird/MHFormer - 中文解读: https://zhuanlan.zhihu.com/p/439459426 **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video** - Paper: https://arxiv.org/abs/2203.00859 - Code: None **Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation** - Paper: https://arxiv.org/abs/2203.07697 - Code: None - 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw **BEV: Putting People in their Place: Monocular Regression of 3D People in Depth** - Homepage: https://arthur151.github.io/BEV/BEV.html - Paper: https://arxiv.org/abs/2112.08274 - Code: https://github.com/Arthur151/ROMP - Dataset: https://github.com/Arthur151/Relative_Human - Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI # 3D语义场景补全(3D Semantic Scene Completion) **MonoScene: Monocular 3D Semantic Scene Completion** - Paper: https://arxiv.org/abs/2112.00726 - Code: https://github.com/cv-rits/MonoScene # 3D重建(3D Reconstruction) **BANMo: Building Animatable 3D Neural Models from Many Casual Videos** - Homepage: https://banmo-www.github.io/ - Paper: https://arxiv.org/abs/2112.12761 - Code: https://github.com/facebookresearch/banmo - 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew # 行人重识别(Person Re-identification) **NFormer: Robust Person Re-identification with Neighbor Transformer** - Paper: https://arxiv.org/abs/2204.09331 - Code: https://github.com/haochenheheda/NFormer # 伪装物体检测(Camouflaged Object Detection) **Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection** - Paper: https://arxiv.org/abs/2203.02688 - Code: https://github.com/lartpang/ZoomNet # 深度估计(Depth Estimation) ## 单目深度估计 **NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation** - Paper: https://arxiv.org/abs/2203.01502 - Code: None **OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion** - Paper: https://arxiv.org/abs/2203.00838 - Code: None **Toward Practical Self-Supervised Monocular Indoor Depth Estimation** - Paper: https://arxiv.org/abs/2112.02306 - Code: None **P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior** - Paper: https://arxiv.org/abs/2204.02091 - Code: https://github.com/SysCV/P3Depth **Multi-Frame Self-Supervised Depth with Transformers** - Homepage: https://sites.google.com/tri.global/depthformer - Paper: https://arxiv.org/abs/2204.07616 - Code: None # 立体匹配(Stereo Matching) **ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching** - Paper: https://arxiv.org/abs/2203.02146 - Code: https://github.com/gangweiX/ACVNet # 特征匹配(Feature Matching) **ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching** - Paper: https://arxiv.org/abs/2204.11700 - Code: None # 车道线检测(Lane Detection) **Rethinking Efficient Lane Detection via Curve Modeling** - Paper: https://arxiv.org/abs/2203.02431 - Code: https://github.com/voldemortX/pytorch-auto-drive - Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4 **A Keypoint-based Global Association Network for Lane Detection** - Paper: https://arxiv.org/abs/2204.07335 - Code: https://github.com/Wolfwjs/GANet # 光流估计(Optical Flow Estimation) **Imposing Consistency for Optical Flow Estimation** - Paper: https://arxiv.org/abs/2204.07262 - Code: None **Deep Equilibrium Optical Flow Estimation** - Paper: https://arxiv.org/abs/2204.08442 - Code: https://github.com/locuslab/deq-flow **GMFlow: Learning Optical Flow via Global Matching** - Paper(Oral): https://arxiv.org/abs/2111.13680 - Code: https://github.com/haofeixu/gmflow # 图像修复(Image Inpainting) **Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding** - Paper: https://arxiv.org/abs/2203.00867 - Code: https://github.com/DQiaole/ZITS_inpainting # 图像检索(Image Retrieval) **Correlation Verification for Image Retrieval** - Paper(Oral): https://arxiv.org/abs/2204.01458 - Code: https://github.com/sungonce/CVNet # 人脸识别(Face Recognition) **AdaFace: Quality Adaptive Margin for Face Recognition** - Paper(Oral): https://arxiv.org/abs/2204.00964 - Code: https://github.com/mk-minchul/AdaFace # 人群计数(Crowd Counting) **Leveraging Self-Supervision for Cross-Domain Crowd Counting** - Paper: https://arxiv.org/abs/2103.16291 - Code: None # 医学图像(Medical Image) **BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation** - Paper: https://arxiv.org/abs/2203.02533 - Code: None **Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification** - Paper: https://arxiv.org/abs/2111.12918 - Code: https://github.com/FBLADL/ACPL **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis** - Paper: https://arxiv.org/abs/2204.10437 - Code: https://github.com/JLiangLab/DiRA # 视频生成(Video Generation) **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2** - Homepage: https://universome.github.io/stylegan-v - Paper: https://arxiv.org/abs/2112.14683 - Code: https://github.com/universome/stylegan-v - Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4 # 场景图生成(Scene Graph Generation) **SGTR: End-to-end Scene Graph Generation with Transformer** - Paper: https://arxiv.org/abs/2112.12970 - Code: None # 参考视频目标分割(Referring Video Object Segmentation) **Language as Queries for Referring Video Object Segmentation** - Paper: https://arxiv.org/abs/2201.00487 - Code: https://github.com/wjn922/ReferFormer **ReSTR: Convolution-free Referring Image Segmentation Using Transformers** - Paper: https://arxiv.org/abs/2203.16768 - Code: None # 步态识别(Gait Recognition) **Gait Recognition in the Wild with Dense 3D Representations and A Benchmark** - Homepage: https://gait3d.github.io/ - Paper: https://arxiv.org/abs/2204.02569 - Code: https://github.com/Gait3D/Gait3D-Benchmark # 风格迁移(Style Transfer) **StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions** - Homepage: https://lukashoel.github.io/stylemesh/ - Paper: https://arxiv.org/abs/2112.01530 - Code: https://github.com/lukasHoel/stylemesh - Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks # 异常检测(Anomaly Detection) **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection** - Paper: https://arxiv.org/abs/2111.08644 - Dataset: https://github.com/lilygeorgescu/UBnormal **Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection** - Paper(Oral): https://arxiv.org/abs/2111.09099 - Code: https://github.com/ristea/sspcab 对抗样本) # 对抗样本(Adversarial Examples) **Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon** - Paper: https://arxiv.org/abs/2203.03818 - Code: https://github.com/hncszyq/ShadowAttack **LAS-AT: Adversarial Training with Learnable Attack Strategy** - Paper(Oral): https://arxiv.org/abs/2203.06616 - Code: https://github.com/jiaxiaojunQAQ/LAS-AT **Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection** - Paper: https://arxiv.org/abs/2112.04532 - Code: https://github.com/joellliu/SegmentAndComplete # 弱监督物体检测(Weakly Supervised Object Localization) **Weakly Supervised Object Localization as Domain Adaption** - Paper: https://arxiv.org/abs/2203.01714 - Code: https://github.com/zh460045050/DA-WSOL_CVPR2022 # 雷达目标检测(Radar Object Detection) **Exploiting Temporal Relations on Radar Perception for Autonomous Driving** - Paper: https://arxiv.org/abs/2204.01184 - Code: None # 高光谱图像重建(Hyperspectral Image Reconstruction) **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction** - Paper: https://arxiv.org/abs/2111.07910 - Code: https://github.com/caiyuanhao1998/MST # 图像拼接(Image Stitching) **Deep Rectangling for Image Stitching: A Learning Baseline** - Paper(Oral): https://arxiv.org/abs/2203.03831 - Code: https://github.com/nie-lang/DeepRectangling - Dataset: https://github.com/nie-lang/DeepRectangling - 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q # 水印(Watermarking) **Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings** - Paper: https://arxiv.org/abs/2104.13450 - Code: None # Action Counting **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** - Paper(Oral): https://arxiv.org/abs/2204.01018 - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html - Code: https://github.com/SvipRepetitionCounting/TransRAC # Grounded Situation Recognition **Collaborative Transformers for Grounded Situation Recognition** - Paper: https://arxiv.org/abs/2203.16518 - Code: https://github.com/jhcho99/CoFormer # Zero-shot Learning **Unseen Classes at a Later Time? No Problem** - Paper: https://arxiv.org/abs/2203.16517 - Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time # DeepFakes **Detecting Deepfakes with Self-Blended Images** - Paper(Oral): https://arxiv.org/abs/2204.08376 - Code: https://github.com/mapooon/SelfBlendedImages # 数据集(Datasets) **It's About Time: Analog Clock Reading in the Wild** - Homepage: https://charigyang.github.io/abouttime/ - Paper: https://arxiv.org/abs/2111.09162 - Code: https://github.com/charigyang/itsabouttime - Demo: https://youtu.be/cbiMACA6dRc **Toward Practical Self-Supervised Monocular Indoor Depth Estimation** - Paper: https://arxiv.org/abs/2112.02306 - Code: None **Kubric: A scalable dataset generator** - Paper: https://arxiv.org/abs/2203.03570 - Code: https://github.com/google-research/kubric - 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg **Scribble-Supervised LiDAR Semantic Segmentation** - Paper: https://arxiv.org/abs/2203.08537 - Dataset: https://github.com/ouenal/scribblekitti **Deep Rectangling for Image Stitching: A Learning Baseline** - Paper(Oral): https://arxiv.org/abs/2203.03831 - Code: https://github.com/nie-lang/DeepRectangling - Dataset: https://github.com/nie-lang/DeepRectangling - 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q **ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer** - Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/ - Paper: https://arxiv.org/abs/2204.02389 - Dataset: https://github.com/rhgao/ObjectFolder - Demo:https://youtu.be/e5aToT3LkRA **Shape from Polarization for Complex Scenes in the Wild** - Homepage: https://chenyanglei.github.io/sfpwild/index.html - Paper: https://arxiv.org/abs/2112.11377 - Code: https://github.com/ChenyangLEI/sfp-wild **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline** - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/ - Paper: https://arxiv.org/abs/2204.04120 **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** - Paper(Oral): https://arxiv.org/abs/2204.01018 - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html - Code: https://github.com/SvipRepetitionCounting/TransRAC **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment** - Paper(Oral): https://arxiv.org/abs/2204.03646 - Dataset: https://github.com/xujinglin/FineDiving - Code: https://github.com/xujinglin/FineDiving - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring** - Paper: https://arxiv.org/abs/2204.02701 - Dataset: https://github.com/yizhiwang96/TextLogoLayout - Code: https://github.com/yizhiwang96/TextLogoLayout **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection** - Homepage: https://thudair.baai.ac.cn/index - Paper: https://arxiv.org/abs/2204.05575 - Code: https://github.com/AIR-THU/DAIR-V2X **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** - Paper: https://arxiv.org/abs/2204.10039 - Code: https://github.com/H-deep/Trans-SVSR/ - Dataset: http://shorturl.at/mpwGX **Putting People in their Place: Monocular Regression of 3D People in Depth** - Homepage: https://arthur151.github.io/BEV/BEV.html - Paper: https://arxiv.org/abs/2112.08274 - Code:https://github.com/Arthur151/ROMP - Dataset: https://github.com/Arthur151/Relative_Human **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection** - Paper: https://arxiv.org/abs/2111.08644 - Dataset: https://github.com/lilygeorgescu/UBnormal **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion** - Homepage: https://dancetrack.github.io - Paper: https://arxiv.org/abs/2111.14690 - Dataset: https://github.com/DanceTrack/DanceTrack **Visual Abductive Reasoning** - Paper: https://arxiv.org/abs/2203.14040 - Code: https://github.com/leonnnop/VAR **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark** - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions** - Homepage: https://ithaca365.mae.cornell.edu/ - Paper: https://arxiv.org/abs/2208.01166 # 新任务(New Task) **Language-based Video Editing via Multi-Modal Multi-Level Transformer** - Paper: https://arxiv.org/abs/2104.01122 - Code: None **It's About Time: Analog Clock Reading in the Wild** - Homepage: https://charigyang.github.io/abouttime/ - Paper: https://arxiv.org/abs/2111.09162 - Code: https://github.com/charigyang/itsabouttime - Demo: https://youtu.be/cbiMACA6dRc **Splicing ViT Features for Semantic Appearance Transfer** - Homepage: https://splice-vit.github.io/ - Paper: https://arxiv.org/abs/2201.00424 - Code: https://github.com/omerbt/Splice **Visual Abductive Reasoning** - Paper: https://arxiv.org/abs/2203.14040 - Code: https://github.com/leonnnop/VAR # 其他(Others) **Kubric: A scalable dataset generator** - Paper: https://arxiv.org/abs/2203.03570 - Code: https://github.com/google-research/kubric - 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning** - Paper: https://arxiv.org/abs/2203.00843 - Code: https://github.com/CurryYuan/X-Trans2Cap **Balanced MSE for Imbalanced Visual Regression** - Paper(Oral): https://arxiv.org/abs/2203.16427 - Code: https://github.com/jiawei-ren/BalancedMSE **SNUG: Self-Supervised Neural Dynamic Garments** - Homepage: http://mslab.es/projects/SNUG/ - Paper(Oral): https://arxiv.org/abs/2204.02219 - Code: https://github.com/isantesteban/snug **Shape from Polarization for Complex Scenes in the Wild** - Homepage: https://chenyanglei.github.io/sfpwild/index.html - Paper: https://arxiv.org/abs/2112.11377 - Code: https://github.com/ChenyangLEI/sfp-wild **LASER: LAtent SpacE Rendering for 2D Visual Localization** - Paper(Oral): https://arxiv.org/abs/2204.00157 - Code: None **Single-Photon Structured Light** - Paper(Oral): https://arxiv.org/abs/2204.05300 - Code: None **3DeformRS: Certifying Spatial Deformations on Point Clouds** - Paper: https://arxiv.org/abs/2204.05687 - Code: None **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring** - Paper: https://arxiv.org/abs/2204.02701 - Dataset: https://github.com/yizhiwang96/TextLogoLayout - Code: https://github.com/yizhiwang96/TextLogoLayout **Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes** - Paper: https://arxiv.org/abs/2203.13412 - Code: https://github.com/zjsong/SSPL **Robust and Accurate Superquadric Recovery: a Probabilistic Approach** - Paper(Oral): https://arxiv.org/abs/2111.14517 - Code: https://github.com/bmlklwx/EMS-superquadric_fitting **Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence** - Paper: https://arxiv.org/abs/2203.00911 - Code: None **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer** - Paper(Oral): https://arxiv.org/abs/2204.08680 - Code: https://github.com/zengwang430521/TCFormer **DeepDPM: Deep Clustering With an Unknown Number of Clusters** - Paper: https://arxiv.org/abs/2203.14309 - Code: https://github.com/BGU-CS-VIL/DeepDPM **ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic** - Paper: https://arxiv.org/abs/2111.14447 - Code: https://github.com/YoadTew/zero-shot-image-to-text **Proto2Proto: Can you recognize the car, the way I do?** - Paper: https://arxiv.org/abs/2204.11830 - Code: https://github.com/archmaester/proto2proto **Putting People in their Place: Monocular Regression of 3D People in Depth** - Homepage: https://arthur151.github.io/BEV/BEV.html - Paper: https://arxiv.org/abs/2112.08274 - Code:https://github.com/Arthur151/ROMP - Dataset: https://github.com/Arthur151/Relative_Human **Light Field Neural Rendering** - Homepage: https://light-field-neural-rendering.github.io/ - Paper(Oral): https://arxiv.org/abs/2112.09687 - Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis** - Paper: https://arxiv.org/abs/2204.06160 - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution **Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning** - Paper: https://arxiv.org/abs/2203.14333 - Code: https://github.com/0liliulei/LIIR ================================================ FILE: CVPR2023-Papers-with-Code.md ================================================ # CVPR 2023 论文和开源项目合集(Papers with Code) [CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)! **25.78% = 2360 / 9155** CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate. > 注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2019](CVPR2019-Papers-with-Code.md) > - [CVPR 2020](CVPR2020-Papers-with-Code.md) > - [CVPR 2021](CVPR2021-Papers-with-Code.md) > - [CVPR 2022](CVPR2022-Papers-with-Code.md) 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~ ![](CVer学术交流群.png) # 【CVPR 2023 论文开源目录】 - [Backbone](#Backbone) - [CLIP](#CLIP) - [MAE](#MAE) - [GAN](#GAN) - [GNN](#GNN) - [MLP](#MLP) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [Prompt](#Prompt) - [Diffusion Models(扩散模型)](#Diffusion) - [Avatars](#Avatars) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [异常检测(Anomaly Detection)](#AD) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [场景图生成(Scene Graph Generation)](#SGG) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) # Backbone **Integrally Pre-Trained Transformer Pyramid Networks** - Paper: https://arxiv.org/abs/2211.12735 - Code: https://github.com/sunsmarterjie/iTPN **Stitchable Neural Networks** - Homepage: https://snnet.github.io/ - Paper: https://arxiv.org/abs/2302.06586 - Code: https://github.com/ziplab/SN-Net **Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks** - Paper: https://arxiv.org/abs/2303.03667 - Code: https://github.com/JierunChen/FasterNet **BiFormer: Vision Transformer with Bi-Level Routing Attention** - Paper: None - Code: https://github.com/rayleizhu/BiFormer **DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network** - Paper: https://arxiv.org/abs/2303.02165 - Code: https://github.com/alibaba/lightweight-neural-architecture-search **Vision Transformer with Super Token Sampling** - Paper: https://arxiv.org/abs/2211.11167 - Code: https://github.com/hhb072/SViT **Hard Patches Mining for Masked Image Modeling** - Paper: None - Code: None **SMPConv: Self-moving Point Representations for Continuous Convolution** - Paper: https://arxiv.org/abs/2304.02330 - Code: https://github.com/sangnekim/SMPConv **Making Vision Transformers Efficient from A Token Sparsification View** - Paper: https://arxiv.org/abs/2303.08685 - Code: https://github.com/changsn/STViT-R # CLIP **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis** - Paper: https://arxiv.org/abs/2301.12959 - Code: https://github.com/tobran/GALIP **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation** - Paper: https://arxiv.org/abs/2303.06285 - Code: https://github.com/Yueming6568/DeltaEdit # MAE **Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders** - Paper: https://arxiv.org/abs/2212.06785 - Code: https://github.com/ZrrSkywalker/I2P-MAE **Generic-to-Specific Distillation of Masked Autoencoders** - Paper: https://arxiv.org/abs/2302.14771 - Code: https://github.com/pengzhiliang/G2SD # GAN **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation** - Paper: https://arxiv.org/abs/2303.06285 - Code: https://github.com/Yueming6568/DeltaEdit # NeRF **NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior** - Home: https://nope-nerf.active.vision/ - Paper: https://arxiv.org/abs/2212.07388 - Code: None **Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures** - Paper: https://arxiv.org/abs/2211.07600 - Code: https://github.com/eladrich/latent-nerf **NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis** - Paper: https://arxiv.org/abs/2301.08556 - Code: None **Panoptic Lifting for 3D Scene Understanding with Neural Fields** - Homepage: https://nihalsid.github.io/panoptic-lifting/ - Paper: https://arxiv.org/abs/2212.09802 - Code: None **NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer** - Homepage: https://redrock303.github.io/nerflix/ - Paper: https://arxiv.org/abs/2303.06919 - Code: None **HNeRV: A Hybrid Neural Representation for Videos** - Homepage: https://haochen-rye.github.io/HNeRV - Paper: https://arxiv.org/abs/2304.02633 - Code: https://github.com/haochen-rye/HNeRV # DETR **DETRs with Hybrid Matching** - Paper: https://arxiv.org/abs/2207.13080 - Code: https://github.com/HDETR # Prompt **Diversity-Aware Meta Visual Prompting** - Paper: https://arxiv.org/abs/2303.08138 - Code: https://github.com/shikiw/DAM-VP # NAS **PA&DA: Jointly Sampling PAth and DAta for Consistent NAS** - Paper: https://arxiv.org/abs/2302.14772 - Code: https://github.com/ShunLu91/PA-DA # Avatars **Structured 3D Features for Reconstructing Relightable and Animatable Avatars** - Homepage: https://enriccorona.github.io/s3f/ - Paper: https://arxiv.org/abs/2212.06820 - Code: None - Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s **Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos** - Homepage: https://augmentedperception.github.io/monoavatar/ - Paper: https://arxiv.org/abs/2304.01436 # ReID(重识别) **Clothing-Change Feature Augmentation for Person Re-Identification** - Paper: None - Code: None **MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID** - Paper: https://arxiv.org/abs/2303.07065 - Code: https://github.com/vimar-gu/MSINet **Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification** - Paper: https://arxiv.org/abs/2304.04205 - Code: None **Large-scale Training Data Search for Object Re-identification** - Paper: https://arxiv.org/abs/2303.16186 - Code: https://github.com/yorkeyao/SnP # Diffusion Models(扩散模型) **Video Probabilistic Diffusion Models in Projected Latent Space** - Homepage: https://sihyun.me/PVDM/ - Paper: https://arxiv.org/abs/2302.07685 - Code: https://github.com/sihyun-yu/PVDM **Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models** - Paper: https://arxiv.org/abs/2211.10655 - Code: None **Imagic: Text-Based Real Image Editing with Diffusion Models** - Homepage: https://imagic-editing.github.io/ - Paper: https://arxiv.org/abs/2210.09276 - Code: None **Parallel Diffusion Models of Operator and Image for Blind Inverse Problems** - Paper: https://arxiv.org/abs/2211.10656 - Code: None **DiffRF: Rendering-guided 3D Radiance Field Diffusion** - Homepage: https://sirwyver.github.io/DiffRF/ - Paper: https://arxiv.org/abs/2212.01206 - Code: None **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation** - Paper: https://arxiv.org/abs/2212.09478 - Code: https://github.com/researchmm/MM-Diffusion **HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising** - Homepage: https://aminshabani.github.io/housediffusion/ - Paper: https://arxiv.org/abs/2211.13287 - Code: https://github.com/aminshabani/house_diffusion **TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets** - Paper: https://arxiv.org/abs/2303.05762 - Code: https://github.com/chenweixin107/TrojDiff **Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption** - Paper: https://arxiv.org/abs/2207.03442 - Code: https://github.com/shiyegao/DDA **DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration** - Paper: https://arxiv.org/abs/2303.06885 - Code: None **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion** - Homepage: https://nv-tlabs.github.io/trace-pace/ - Paper: https://arxiv.org/abs/2304.01893 - Code: None **Generative Diffusion Prior for Unified Image Restoration and Enhancement** - Paper: https://arxiv.org/abs/2304.01247 - Code: None **Conditional Image-to-Video Generation with Latent Flow Diffusion Models** - Paper: https://arxiv.org/abs/2303.13744 - Code: https://github.com/nihaomiao/CVPR23_LFDM # 长尾分布(Long-Tail) **Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation** - Paper: https://arxiv.org/abs/2304.01279 - Code: None # Vision Transformer **Integrally Pre-Trained Transformer Pyramid Networks** - Paper: https://arxiv.org/abs/2211.12735 - Code: https://github.com/sunsmarterjie/iTPN **Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors** - Homepage: https://niessnerlab.org/projects/hou2023mask3d.html - Paper: https://arxiv.org/abs/2302.14746 - Code: None **Learning Trajectory-Aware Transformer for Video Super-Resolution** - Paper: https://arxiv.org/abs/2204.04216 - Code: https://github.com/researchmm/TTVSR **Vision Transformers are Parameter-Efficient Audio-Visual Learners** - Homepage: https://yanbo.ml/project_page/LAVISH/ - Code: https://github.com/GenjiB/LAVISH **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes** - Paper: https://arxiv.org/abs/2303.04249 - Code: None **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets** - Paper: https://arxiv.org/abs/2301.06051 - Code: https://github.com/Haiyang-W/DSVT **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting** - Paper: https://arxiv.org/abs/2211.10772 - Code link: https://github.com/ViTAE-Transformer/DeepSolo **BiFormer: Vision Transformer with Bi-Level Routing Attention** - Paper: https://arxiv.org/abs/2303.08810 - Code: https://github.com/rayleizhu/BiFormer **Vision Transformer with Super Token Sampling** - Paper: https://arxiv.org/abs/2211.11167 - Code: https://github.com/hhb072/SViT **BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision** - Paper: https://arxiv.org/abs/2211.10439 - Code: None **BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation** - Paper: None - Code: None **Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention** - Paper: https://arxiv.org/abs/2304.03282 - Code: None **Making Vision Transformers Efficient from A Token Sparsification View** - Paper: https://arxiv.org/abs/2303.08685 - Code: https://github.com/changsn/STViT-R # 视觉和语言(Vision-Language) **GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods** - Paper: https://arxiv.org/abs/2301.01893 - Code: None **Teaching Structured Vision&Language Concepts to Vision&Language Models** - Paper: https://arxiv.org/abs/2211.11733 - Code: None **Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks** - Paper: https://arxiv.org/abs/2211.09808 - Code: https://github.com/fundamentalvision/Uni-Perceiver **Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training** - Paper: https://arxiv.org/abs/2303.00040 - Code: None **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining** - Paper: https://arxiv.org/abs/2303.02489 - Code: None **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks** - Paper: https://arxiv.org/abs/2303.02483 - Code: None **Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding** - Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html - Paper: https://arxiv.org/abs/2303.04077 - Code: None **All in One: Exploring Unified Video-Language Pre-training** - Paper: https://arxiv.org/abs/2203.07303 - Code: https://github.com/showlab/all-in-one **Position-guided Text Prompt for Vision Language Pre-training** - Paper: https://arxiv.org/abs/2212.09737 - Code: https://github.com/sail-sg/ptp **EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding** - Paper: https://arxiv.org/abs/2209.14941 - Code: https://github.com/yanmin-wu/EDA **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining** - Paper: https://arxiv.org/abs/2303.02489 - Code: None **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks** - Paper: https://arxiv.org/abs/2303.02483 - Code: https://github.com/BrandonHanx/FAME-ViL **Align and Attend: Multimodal Summarization with Dual Contrastive Losses** - Homepage: https://boheumd.github.io/A2Summ/ - Paper: https://arxiv.org/abs/2303.07284 - Code: https://github.com/boheumd/A2Summ **Multi-Modal Representation Learning with Text-Driven Soft Masks** - Paper: https://arxiv.org/abs/2304.00719 - Code: None **Learning to Name Classes for Vision and Language Models** - Paper: https://arxiv.org/abs/2304.01830 - Code: None # 目标检测(Object Detection) **YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors** - Paper: https://arxiv.org/abs/2207.02696 - Code: https://github.com/WongKinYiu/yolov7 **DETRs with Hybrid Matching** - Paper: https://arxiv.org/abs/2207.13080 - Code: https://github.com/HDETR **Enhanced Training of Query-Based Object Detection via Selective Query Recollection** - Paper: https://arxiv.org/abs/2212.07593 - Code: https://github.com/Fangyi-Chen/SQR **Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection** - Paper: https://arxiv.org/abs/2303.05892 - Code: https://github.com/LutingWang/OADP # 目标跟踪(Object Tracking) **Simple Cues Lead to a Strong Multi-Object Tracker** - Paper: https://arxiv.org/abs/2206.04656 - Code: None **Joint Visual Grounding and Tracking with Natural Language Specification** - Paper: https://arxiv.org/abs/2303.12027 - Code: https://github.com/lizhou-cs/JointNLT # 语义分割(Semantic Segmentation) **Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos** - Paper: https://arxiv.org/abs/2303.07224 - Code: https://github.com/THU-LYJ-Lab/AR-Seg **FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding** - Paper: https://arxiv.org/abs/2304.02135 - Code: https://github.com/uark-cviu/FREDOM # 医学图像分割(Medical Image Segmentation) **Label-Free Liver Tumor Segmentation** - Paper: https://arxiv.org/abs/2303.14869 - Code: https://github.com/MrGiovanni/SyntheticTumors **Directional Connectivity-based Segmentation of Medical Images** - Paper: https://arxiv.org/abs/2304.00145 - Code: https://github.com/Zyun-Y/DconnNet **Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation** - Paper: https://arxiv.org/abs/2305.00673 - Code: https://github.com/DeepMed-Lab-ECNU/BCP **Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization** - Paper: https://arxiv.org/abs/2304.00212 - Code: None **Fair Federated Medical Image Segmentation via Client Contribution Estimation** - Paper: https://arxiv.org/abs/2303.16520 - Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce **Ambiguous Medical Image Segmentation using Diffusion Models** - Homepage: https://aimansnigdha.github.io/cimd/ - Paper: https://arxiv.org/abs/2304.04745 - Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models **Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation** - Paper: https://arxiv.org/abs/2303.13090 - Code: https://github.com/HengCai-NJU/DeSCO **MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery** - Paper: https://arxiv.org/abs/2301.01767 - Code: https://github.com/DeepMed-Lab-ECNU/MagicNet **MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation** - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html - Code: https://github.com/WYC-321/MCF **Rethinking Few-Shot Medical Segmentation: A Vector Quantization View** - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html - Code: None **Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation** - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html - Code: https://github.com/hritam-98/PatchCL-MedSeg **SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation** - Paper: https://arxiv.org/abs/2305.11012 - Code: None **DoNet: Deep De-overlapping Network for Cytology Instance Segmentation** - Paper: https://arxiv.org/abs/2303.14373 - Code: https://github.com/DeepDoNet/DoNet # 视频目标分割(Video Object Segmentation) **Two-shot Video Object Segmentation** - Paper: https://arxiv.org/abs/2303.12078 - Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation **Under Video Object Segmentation Section** - Paper: https://arxiv.org/abs/2303.07815 - Code: None # 视频实例分割(Video Instance Segmentation) **Mask-Free Video Instance Segmentation** - Paper: https://arxiv.org/abs/2303.15904 - Code: https://github.com/SysCV/MaskFreeVis # 参考图像分割(Referring Image Segmentation ) **PolyFormer: Referring Image Segmentation as Sequential Polygon Generation** - Paper: https://arxiv.org/abs/2302.07387 - Code: None # 3D点云(3D-Point-Cloud) **Physical-World Optical Adversarial Attacks on 3D Face Recognition** - Paper: https://arxiv.org/abs/2205.13412 - Code: https://github.com/PolyLiYJ/SLAttack.git **IterativePFN: True Iterative Point Cloud Filtering** - Paper: https://arxiv.org/abs/2304.01529 - Code: https://github.com/ddsediri/IterativePFN **Attention-based Point Cloud Edge Sampling** - Homepage: https://junweizheng93.github.io/publications/APES/APES.html - Paper: https://arxiv.org/abs/2302.14673 - Code: https://github.com/JunweiZheng93/APES # 3D目标检测(3D Object Detection) **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets** - Paper: https://arxiv.org/abs/2301.06051 - Code: https://github.com/Haiyang-W/DSVT **FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection** - Paper: https://arxiv.org/abs/2301.04467 - Code: None **3D Video Object Detection with Learnable Object-Centric Global Optimization** - Paper: None - Code: None **Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection** - Paper: https://arxiv.org/abs/2304.01464 - Code: https://github.com/azhuantou/HSSDA # 3D语义分割(3D Semantic Segmentation) **Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation** - Paper: https://arxiv.org/abs/2303.11203 - Code: https://github.com/l1997i/lim3d # 3D语义场景补全(3D Semantic Scene Completion) - Paper: https://arxiv.org/abs/2302.12251 - Code: https://github.com/NVlabs/VoxFormer # 3D配准(3D Registration) **Robust Outlier Rejection for 3D Registration with Variational Bayes** - Paper: https://arxiv.org/abs/2304.01514 - Code: https://github.com/Jiang-HB/VBReg # 3D人体姿态估计(3D Human Pose Estimation) # 3D人体Mesh估计(3D Human Mesh Estimation) **3D Human Mesh Estimation from Virtual Markers** - Paper: https://arxiv.org/abs/2303.11726 - Code: https://github.com/ShirleyMaxx/VirtualMarker # Low-level Vision **Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective** - Paper: https://arxiv.org/abs/2303.06859 - Code: https://github.com/lixinustc/Casual-IR-DIL **Burstormer: Burst Image Restoration and Enhancement Transformer** - Paper: https://arxiv.org/abs/2304.01194 - Code: http://github.com/akshaydudhane16/Burstormer # 超分辨率(Video Super-Resolution) **Super-Resolution Neural Operator** - Paper: https://arxiv.org/abs/2303.02584 - Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator ## 视频超分辨率 **Learning Trajectory-Aware Transformer for Video Super-Resolution** - Paper: https://arxiv.org/abs/2204.04216 - Code: https://github.com/researchmm/TTVSR Denoising # 去噪(Denoising) ## 图像去噪(Image Denoising) **Masked Image Training for Generalizable Deep Image Denoising** - Paper- : https://arxiv.org/abs/2303.13132 - Code: https://github.com/haoyuc/MaskedDenoising # 图像生成(Image Generation) **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis** - Paper: https://arxiv.org/abs/2301.12959 - Code: https://github.com/tobran/GALIP **MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis** - Paper: https://arxiv.org/abs/2211.09117 - Code: https://github.com/LTH14/mage **Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation** - Paper: https://arxiv.org/abs/2304.01816 - Code: None **Few-shot Semantic Image Synthesis with Class Affinity Transfer** - Paper: https://arxiv.org/abs/2304.02321 - Code: None **TopNet: Transformer-based Object Placement Network for Image Compositing** - Paper: https://arxiv.org/abs/2304.03372 - Code: None # 视频生成(Video Generation) **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation** - Paper: https://arxiv.org/abs/2212.09478 - Code: https://github.com/researchmm/MM-Diffusion **Conditional Image-to-Video Generation with Latent Flow Diffusion Models** - Paper: https://arxiv.org/abs/2303.13744 - Code: https://github.com/nihaomiao/CVPR23_LFDM # 视频理解(Video Understanding) **Learning Transferable Spatiotemporal Representations from Natural Script Knowledge** - Paper: https://arxiv.org/abs/2209.15280 - Code: https://github.com/TencentARC/TVTS **Frame Flexible Network** - Paper: https://arxiv.org/abs/2303.14817 - Code: https://github.com/BeSpontaneous/FFN **Masked Motion Encoding for Self-Supervised Video Representation Learning** - Paper: https://arxiv.org/abs/2210.06096 - Code: https://github.com/XinyuSun/MME **MARLIN: Masked Autoencoder for facial video Representation LearnING** - Paper: https://arxiv.org/abs/2211.06627 - Code: https://github.com/ControlNet/MARLIN # 行为检测(Action Detection) **TriDet: Temporal Action Detection with Relative Boundary Modeling** - Paper: https://arxiv.org/abs/2303.07347 - Code: https://github.com/dingfengshi/TriDet # 文本检测(Text Detection) **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting** - Paper: https://arxiv.org/abs/2211.10772 - Code link: https://github.com/ViTAE-Transformer/DeepSolo # 知识蒸馏(Knowledge Distillation) **Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation** - Paper: https://arxiv.org/abs/2302.14290 - Code: None **Generic-to-Specific Distillation of Masked Autoencoders** - Paper: https://arxiv.org/abs/2302.14771 - Code: https://github.com/pengzhiliang/G2SD # 模型剪枝(Model Pruning) **DepGraph: Towards Any Structural Pruning** - Paper: https://arxiv.org/abs/2301.12900 - Code: https://github.com/VainF/Torch-Pruning # 图像压缩(Image Compression) **Context-Based Trit-Plane Coding for Progressive Image Compression** - Paper: https://arxiv.org/abs/2303.05715 - Code: https://github.com/seungminjeon-github/CTC # 异常检测(Anomaly Detection) **Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images** - Paper: https://arxiv.org/abs/2111.13495 - Code: https://github.com/tiangexiang/SQUID # 三维重建(3D Reconstruction) **OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields** - Paper: https://arxiv.org/abs/2211.12886 - Code: None **SparsePose: Sparse-View Camera Pose Regression and Refinement** - Paper: https://arxiv.org/abs/2211.16991 - Code: None **NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction** - Paper: https://arxiv.org/abs/2303.02375 - Code: None **Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition** - Homepage: https://moygcc.github.io/vid2avatar/ - Paper: https://arxiv.org/abs/2302.11566 - Code: https://github.com/MoyGcc/vid2avatar - Demo: https://youtu.be/EGi47YeIeGQ **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision** - Paper: https://arxiv.org/abs/2106.09614 - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA **Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction** - Paper: https://arxiv.org/abs/2303.05937 - Code: None **3D Cinemagraphy from a Single Image** - Homepage: https://xingyi-li.github.io/3d-cinemagraphy/ - Paper: https://arxiv.org/abs/2303.05724 - Code: https://github.com/xingyi-li/3d-cinemagraphy **Revisiting Rotation Averaging: Uncertainties and Robust Losses** - Paper: https://arxiv.org/abs/2303.05195 - Code https://github.com/zhangganlin/GlobalSfMpy **FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction** - Paper: https://arxiv.org/abs/2211.13874 - Code: https://github.com/csbhr/FFHQ-UV **A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images** - Homepage: https://younglbw.github.io/HRN-homepage/ - Paper: https://arxiv.org/abs/2302.14434 - Code: https://github.com/youngLBW/HRN # 深度估计(Depth Estimation) **Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation** - Paper: https://arxiv.org/abs/2211.13202 - Code: https://github.com/noahzn/Lite-Mono # 轨迹预测(Trajectory Prediction) **IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction** - Paper: https://arxiv.org/abs/2303.00575 - Code: None **EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning** - Paper: https://arxiv.org/abs/2303.10876 - Code: https://github.com/MediaBrain-SJTU/EqMotion # 车道线检测(Lane Detection) **Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection** - Paper: https://arxiv.org/abs/2301.02371 - Code: https://github.com/tusen-ai/Anchor3DLane **BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points** - Paper: https://arxiv.org/abs/2210.06006v3 - Code: https://github.com/gigo-team/bev_lane_det # 图像描述(Image Captioning) **ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing** - Paper: https://arxiv.org/abs/2303.02437 - Code: Node **Cross-Domain Image Captioning with Discriminative Finetuning** - Paper: https://arxiv.org/abs/2304.01662 - Code: None **Model-Agnostic Gender Debiased Image Captioning** - Paper: https://arxiv.org/abs/2304.03693 - Code: None # 视觉问答(Visual Question Answering) **MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering** - Paper: https://arxiv.org/abs/2303.01239 - Code: https://github.com/jingjing12110/MixPHM # 手语识别(Sign Language Recognition) **Continuous Sign Language Recognition with Correlation Network** Paper: https://arxiv.org/abs/2303.03202 Code: https://github.com/hulianyuyy/CorrNet # 视频预测(Video Prediction) **MOSO: Decomposing MOtion, Scene and Object for Video Prediction** - Paper: https://arxiv.org/abs/2303.03684 - Code: https://github.com/anonymous202203/MOSO # 新视点合成(Novel View Synthesis) **3D Video Loops from Asynchronous Input** - Homepage: https://limacv.github.io/VideoLoop3D_web/ - Paper: https://arxiv.org/abs/2303.05312 - Code: https://github.com/limacv/VideoLoop3D # Zero-Shot Learning(零样本学习) **Bi-directional Distribution Alignment for Transductive Zero-Shot Learning** - Paper: https://arxiv.org/abs/2303.08698 - Code: https://github.com/Zhicaiwww/Bi-VAEGAN **Semantic Prompt for Few-Shot Learning** - Paper: None - Code: None # 立体匹配(Stereo Matching) **Iterative Geometry Encoding Volume for Stereo Matching** - Paper: https://arxiv.org/abs/2303.06615 - Code: https://github.com/gangweiX/IGEV **Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation** - Paper: https://arxiv.org/abs/2304.00152 - Code: None # 特征匹配(Feature Matching) **Adaptive Spot-Guided Transformer for Consistent Local Feature Matching** - Homepage: [https://astr2023.github.io](https://astr2023.github.io/) - Paper: https://arxiv.org/abs/2303.16624 - Code: https://github.com/ASTR2023/ASTR # 场景图生成(Scene Graph Generation) **Prototype-based Embedding Network for Scene Graph Generation** - Paper: https://arxiv.org/abs/2303.07096 - Code: None # 隐式神经表示(Implicit Neural Representations) **Polynomial Implicit Neural Representations For Large Diverse Datasets** - Paper: https://arxiv.org/abs/2303.11424 - Code: https://github.com/Rajhans0/Poly_INR # 图像质量评价(Image Quality Assessment) **Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild** - Paper: https://arxiv.org/abs/2304.00451 - Code: None # 数据集(Datasets) **Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes** - Paper: https://arxiv.org/abs/2303.02760 - Code: None **Align and Attend: Multimodal Summarization with Dual Contrastive Losses** - Homepage: https://boheumd.github.io/A2Summ/ - Paper: https://arxiv.org/abs/2303.07284 - Code: https://github.com/boheumd/A2Summ **GeoNet: Benchmarking Unsupervised Adaptation across Geographies** - Homepage: https://tarun005.github.io/GeoNet/ - Paper: https://arxiv.org/abs/2303.15443 **CelebV-Text: A Large-Scale Facial Text-Video Dataset** - Homepage: https://celebv-text.github.io/ - Paper: https://arxiv.org/abs/2303.14717 # 其他(Others) **Interactive Segmentation as Gaussian Process Classification** - Paper: https://arxiv.org/abs/2302.14578 - Code: None **Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger** - Paper: https://arxiv.org/abs/2302.14677 - Code: None **SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries** - Homepage: http://bit.ly/splinecam - Paper: https://arxiv.org/abs/2302.12828 - Code: None **SCOTCH and SODA: A Transformer Video Shadow Detection Framework** - Paper: https://arxiv.org/abs/2211.06885 - Code: None **DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization** - Homepage: https://ai4ce.github.io/DeepMapping2/ - Paper: https://arxiv.org/abs/2212.06331 - None: https://github.com/ai4ce/DeepMapping2 **RelightableHands: Efficient Neural Relighting of Articulated Hand Models** - Homepage: https://sh8.io/#/relightable_hands - Paper: https://arxiv.org/abs/2302.04866 - Code: None **Token Turing Machines** - Paper: https://arxiv.org/abs/2211.09119 - Code: None **Single Image Backdoor Inversion via Robust Smoothed Classifiers** - Paper: https://arxiv.org/abs/2303.00215 - Code: https://github.com/locuslab/smoothinv **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision** - Paper: https://arxiv.org/abs/2106.09614 - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA **HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics** - Homepage: https://dolorousrtur.github.io/hood/ - Paper: https://arxiv.org/abs/2212.07242 - Code: https://github.com/dolorousrtur/hood - Demo: https://www.youtube.com/watch?v=cBttMDPrUYY **A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others** - Paper: https://arxiv.org/abs/2212.04825 - Code: https://github.com/facebookresearch/Whac-A-Mole.git **RelightableHands: Efficient Neural Relighting of Articulated Hand Models** - Homepage: https://sh8.io/#/relightable_hands - Paper: https://arxiv.org/abs/2302.04866 - Code: None - Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4 **Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation** - Paper: https://arxiv.org/abs/2303.00914 - Code: None **Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression** - Paper: https://arxiv.org/abs/2303.01052 - Code: None **UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy** - Paper: https://arxiv.org/abs/2303.00938 - Code: None **Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness** - Paper: https://arxiv.org/abs/2303.00971 - Code: https://github.com/zhijieshen-bjtu/DOPNet **Learning Neural Parametric Head Models** - Homepage: https://simongiebenhain.github.io/NPHM) - Paper: https://arxiv.org/abs/2212.02761 - Code: None **A Meta-Learning Approach to Predicting Performance and Data Requirements** - Paper: https://arxiv.org/abs/2303.01598 - Code: None **MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision** - Homepage: https://imagine.enpc.fr/~guedona/MACARONS/ - Paper: https://arxiv.org/abs/2303.03315 - Code: None **Masked Images Are Counterfactual Samples for Robust Fine-tuning** - Paper: https://arxiv.org/abs/2303.03052 - Code: None **HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling** - Paper: https://arxiv.org/abs/2303.02700 - Code: None **Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization** - Paper: https://arxiv.org/abs/2303.02328 - Code: None **Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization** - Paper: https://arxiv.org/abs/2303.03108 - Code: None **Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples** - Paper: https://arxiv.org/abs/2301.01217 - Code: https://github.com/jiamingzhang94/Unlearnable-Clusters **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes** - Paper: https://arxiv.org/abs/2303.04249 - Code: None **UniHCP: A Unified Model for Human-Centric Perceptions** - Paper: https://arxiv.org/abs/2303.02936 - Code: https://github.com/OpenGVLab/UniHCP **CUDA: Convolution-based Unlearnable Datasets** - Paper: https://arxiv.org/abs/2303.04278 - Code: https://github.com/vinusankars/Convolution-based-Unlearnability **Masked Images Are Counterfactual Samples for Robust Fine-tuning** - Paper: https://arxiv.org/abs/2303.03052 - Code: None **AdaptiveMix: Robust Feature Representation via Shrinking Feature Space** - Paper: https://arxiv.org/abs/2303.01559 - Code: https://github.com/WentianZhang-ML/AdaptiveMix **Physical-World Optical Adversarial Attacks on 3D Face Recognition** - Paper: https://arxiv.org/abs/2205.13412 - Code: https://github.com/PolyLiYJ/SLAttack.git **DPE: Disentanglement of Pose and Expression for General Video Portrait Editing** - Paper: https://arxiv.org/abs/2301.06281 - Code: https://carlyx.github.io/DPE/ **SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation** - Paper: https://arxiv.org/abs/2211.12194 - Code: https://github.com/Winfredy/SadTalker **Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models** - Paper: None - Code: None **Sharpness-Aware Gradient Matching for Domain Generalization** - Paper: None - Code: https://github.com/Wang-pengfei/SAGM **Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization** - Paper: None - Code: None **Blind Video Deflickering by Neural Filtering with a Flawed Atlas** - Homepage: https://chenyanglei.github.io/deflicker - Paper: None - Code: None **RiDDLE: Reversible and Diversified De-identification with Latent Encryptor** - Paper: None - Code: https://github.com/ldz666666/RiDDLE **PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation** - Paper: https://arxiv.org/abs/2303.07337 - Code: None **Upcycling Models under Domain and Category Shift** - Paper: https://arxiv.org/abs/2303.07110 - Code: https://github.com/ispc-lab/GLC **Modality-Agnostic Debiasing for Single Domain Generalization** - Paper: https://arxiv.org/abs/2303.07123 - Code: None **Progressive Open Space Expansion for Open-Set Model Attribution** - Paper: https://arxiv.org/abs/2303.06877 - Code: None **Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies** - Paper: https://arxiv.org/abs/2303.06856 - Code: None **GFPose: Learning 3D Human Pose Prior with Gradient Fields** - Paper: https://arxiv.org/abs/2212.08641 - Code: https://github.com/Embracing/GFPose **PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment** - Paper: https://arxiv.org/abs/2303.11526 - Code: https://github.com/Zhang-VISLab **Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings** - Paper: https://arxiv.org/abs/2303.11502 - Code: None **Boundary Unlearning** - Paper: https://arxiv.org/abs/2303.11570 - Code: None **ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing** - Paper: https://arxiv.org/abs/2303.17096 - Code: https://github.com/alibaba/easyrobust **Zero-shot Model Diagnosis** - Paper: https://arxiv.org/abs/2303.15441 - Code: None **GeoNet: Benchmarking Unsupervised Adaptation across Geographies** - Homepage: https://tarun005.github.io/GeoNet/ - Paper: https://arxiv.org/abs/2303.15443 **Quantum Multi-Model Fitting** - Paper: https://arxiv.org/abs/2303.15444 - Code: https://github.com/FarinaMatteo/qmmf **DivClust: Controlling Diversity in Deep Clustering** - Paper: https://arxiv.org/abs/2304.01042 - Code: None **Neural Volumetric Memory for Visual Locomotion Control** - Homepage: https://rchalyang.github.io/NVM - Paper: https://arxiv.org/abs/2304.01201 - Code: https://rchalyang.github.io/NVM **MonoHuman: Animatable Human Neural Field from Monocular Video** - Homepage: https://yzmblog.github.io/projects/MonoHuman/ - Paper: https://arxiv.org/abs/2304.02001 - Code: https://github.com/Yzmblog/MonoHuman **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion** - Homepage: https://nv-tlabs.github.io/trace-pace/ - Paper: https://arxiv.org/abs/2304.01893 - Code: None **Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification** - Paper: https://arxiv.org/abs/2304.01804 - Code: None **HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering** - Paper: https://arxiv.org/abs/2304.01686 - Code: None **On the Stability-Plasticity Dilemma of Class-Incremental Learning** - Paper: https://arxiv.org/abs/2304.01663 - Code: None **Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning** - Paper: https://arxiv.org/abs/2304.01482 - Code: None **VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution** - Paper: https://arxiv.org/abs/2304.01434 - Code: https://github.com/jaeill/CVPR23-VNE **Detecting and Grounding Multi-Modal Media Manipulation** - Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake - Paper: https://arxiv.org/abs/2304.02556 - Code: https://github.com/rshaojimmy/MultiModal-DeepFake **Meta-causal Learning for Single Domain Generalization** - Paper: https://arxiv.org/abs/2304.03709 - Code: None **Disentangling Writer and Character Styles for Handwriting Generation** - Paper: https://arxiv.org/abs/2303.14736 - Code: https://github.com/dailenson/SDT **DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects** - Homepage: https://www.chenbao.tech/dexart/ - Code: https://github.com/Kami-code/dexart-release **Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision** - Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html - Paper: https://arxiv.org/abs/2303.00462 - Code: https://github.com/Toytiny/CMFlow **Marching-Primitives: Shape Abstraction from Signed Distance Function** - Paper: https://arxiv.org/abs/2303.13190 - Code: https://github.com/ChirikjianLab/Marching-Primitives **Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision** - Paper: https://arxiv.org/abs/2303.00885 - Code: None ================================================ FILE: CVPR2024-Papers-with-Code.md ================================================ # CVPR 2024 论文和开源项目合集(Papers with Code) CVPR 2024 decisions are now available on OpenReview! > 注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) > - [CVPR 2023](CVPR2022-Papers-with-Code.md) 欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来! ![](CVer学术交流群.png) # 【CVPR 2024 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [MAE](#MAE) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [Prompt](#Prompt) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [场景图生成(Scene Graph Generation)](#SGG) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) # 3DGS(Gaussian Splatting) **Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering** - Homepage: https://city-super.github.io/scaffold-gs/ - Paper: https://arxiv.org/abs/2312.00109 - Code: https://github.com/city-super/Scaffold-GS **GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis** - Homepage: https://shunyuanzheng.github.io/GPS-Gaussian - Paper: https://arxiv.org/abs/2312.02155 - Code: https://github.com/ShunyuanZheng/GPS-Gaussian **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians** - Paper: https://arxiv.org/abs/2312.02134 - Code: https://github.com/huliangxiao/GaussianAvatar **GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting** - Paper: https://arxiv.org/abs/2311.14521 - Code: https://github.com/buaacyw/GaussianEditor **Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction** - Homepage: https://ingra14m.github.io/Deformable-Gaussians/ - Paper: https://arxiv.org/abs/2309.13101 - Code: https://github.com/ingra14m/Deformable-3D-Gaussians **SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes** - Homepage: https://yihua7.github.io/SC-GS-web/ - Paper: https://arxiv.org/abs/2312.14937 - Code: https://github.com/yihua7/SC-GS **Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis** - Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/ - Paper: https://arxiv.org/abs/2312.16812 - Code: https://github.com/oppo-us-research/SpacetimeGaussians **DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization** - Homepage: https://fictionarry.github.io/DNGaussian/ - Paper: https://arxiv.org/abs/2403.06912 - Code: https://github.com/Fictionarry/DNGaussian **4D Gaussian Splatting for Real-Time Dynamic Scene Rendering** - Paper: https://arxiv.org/abs/2310.08528 - Code: https://github.com/hustvl/4DGaussians **GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models** - Paper: https://arxiv.org/abs/2310.08529 - Code: https://github.com/hustvl/GaussianDreamer # Avatars **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians** - Paper: https://arxiv.org/abs/2312.02134 - Code: https://github.com/huliangxiao/GaussianAvatar **Real-Time Simulated Avatar from Head-Mounted Sensors** - Homepage: https://www.zhengyiluo.com/SimXR/ - Paper: https://arxiv.org/abs/2403.06862 # Backbone **RepViT: Revisiting Mobile CNN From ViT Perspective** - Paper: https://arxiv.org/abs/2307.09283 - Code: https://github.com/THU-MIG/RepViT **TransNeXt: Robust Foveal Visual Perception for Vision Transformers** - Paper: https://arxiv.org/abs/2311.17132 - Code: https://github.com/DaiShiResearch/TransNeXt # CLIP **Alpha-CLIP: A CLIP Model Focusing on Wherever You Want** - Paper: https://arxiv.org/abs/2312.03818 - Code: https://github.com/SunzeY/AlphaCLIP **FairCLIP: Harnessing Fairness in Vision-Language Learning** - Paper: https://arxiv.org/abs/2403.19949 - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP # MAE # Embodied AI **EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI** - Homepage: https://tai-wang.github.io/embodiedscan/ - Paper: https://arxiv.org/abs/2312.16170 - Code: https://github.com/OpenRobotLab/EmbodiedScan **MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception** - Homepage: https://iranqin.github.io/MP5.github.io/ - Paper: https://arxiv.org/abs/2312.07472 - Code: https://github.com/IranQin/MP5 **LEMON: Learning 3D Human-Object Interaction Relation from 2D Images** - Paper: https://arxiv.org/abs/2312.08963 - Code: https://github.com/yyvhang/lemon_3d # GAN # OCR **An Empirical Study of Scaling Law for OCR** - Paper: https://arxiv.org/abs/2401.00028 - Code: https://github.com/large-ocr-model/large-ocr-model.github.io **ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting** - Paper: https://arxiv.org/abs/2403.00303 - Code: https://github.com/PriNing/ODM # NeRF **PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF** - Paper: https://arxiv.org/abs/2311.13099 - Code: https://github.com/FYTalon/pienerf/ # DETR **DETRs Beat YOLOs on Real-time Object Detection** - Paper: https://arxiv.org/abs/2304.08069 - Code: https://github.com/lyuwenyu/RT-DETR **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement** - Paper: https://arxiv.org/abs/2403.16131 - Code: https://github.com/xiuqhou/Salience-DETR # Prompt # 多模态大语言模型(MLLM) **mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration** - Paper: https://arxiv.org/abs/2311.04257 - Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2 **Link-Context Learning for Multimodal LLMs** - Paper: https://arxiv.org/abs/2308.07891 - Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main **OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation** - Paper: https://arxiv.org/abs/2311.17911 - Code: https://github.com/shikiw/OPERA **Making Large Multimodal Models Understand Arbitrary Visual Prompts** - Homepage: https://vip-llava.github.io/ - Paper: https://arxiv.org/abs/2312.00784 **Pink: Unveiling the power of referential comprehension for multi-modal llms** - Paper: https://arxiv.org/abs/2310.00582 - Code: https://github.com/SY-Xuan/Pink **Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding** - Paper: https://arxiv.org/abs/2311.08046 - Code: https://github.com/PKU-YuanGroup/Chat-UniVi **OneLLM: One Framework to Align All Modalities with Language** - Paper: https://arxiv.org/abs/2312.03700 - Code: https://github.com/csuhan/OneLLM # 大语言模型(LLM) **VTimeLLM: Empower LLM to Grasp Video Moments** - Paper: https://arxiv.org/abs/2311.18445 - Code: https://github.com/huangb23/VTimeLLM # NAS # ReID(重识别) **Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification** - Paper: https://arxiv.org/abs/2403.10254 - Code: https://github.com/924973292/EDITOR **Noisy-Correspondence Learning for Text-to-Image Person Re-identification** - Paper: https://arxiv.org/abs/2308.09911 - Code : https://github.com/QinYang79/RDE # 扩散模型(Diffusion Models) **InstanceDiffusion: Instance-level Control for Image Generation** - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/ - Paper: https://arxiv.org/abs/2402.03290 - Code: https://github.com/frank-xwang/InstanceDiffusion **Residual Denoising Diffusion Models** - Paper: https://arxiv.org/abs/2308.13712 - Code: https://github.com/nachifur/RDDM **DeepCache: Accelerating Diffusion Models for Free** - Paper: https://arxiv.org/abs/2312.00858 - Code: https://github.com/horseee/DeepCache **DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations** - Homepage: https://tianhao-qi.github.io/DEADiff/ - Paper: https://arxiv.org/abs/2403.06951 - Code: https://github.com/Tianhao-Qi/DEADiff_code **SVGDreamer: Text Guided SVG Generation with Diffusion Model** - Paper: https://arxiv.org/abs/2312.16476 - Code: https://ximinng.github.io/SVGDreamer-project/ **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model** - Paper: https://arxiv.org/abs/2312.05849 - Code: https://github.com/jiuntian/interactdiffusion **MMA-Diffusion: MultiModal Attack on Diffusion Models** - Paper: https://arxiv.org/abs/2311.17516 - Code: https://github.com/yangyijune/MMA-Diffusion **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models** - Homeoage: https://video-motion-customization.github.io/ - Paper: https://arxiv.org/abs/2312.00845 - Code: https://github.com/HyeonHo99/Video-Motion-Customization # Vision Transformer **TransNeXt: Robust Foveal Visual Perception for Vision Transformers** - Paper: https://arxiv.org/abs/2311.17132 - Code: https://github.com/DaiShiResearch/TransNeXt **RepViT: Revisiting Mobile CNN From ViT Perspective** - Paper: https://arxiv.org/abs/2307.09283 - Code: https://github.com/THU-MIG/RepViT **A General and Efficient Training for Transformer via Token Expansion** - Paper: https://arxiv.org/abs/2404.00672 - Code: https://github.com/Osilly/TokenExpansion # 视觉和语言(Vision-Language) **PromptKD: Unsupervised Prompt Distillation for Vision-Language Models** - Paper: https://arxiv.org/abs/2403.02781 - Code: https://github.com/zhengli97/PromptKD **FairCLIP: Harnessing Fairness in Vision-Language Learning** - Paper: https://arxiv.org/abs/2403.19949 - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP # 目标检测(Object Detection) **DETRs Beat YOLOs on Real-time Object Detection** - Paper: https://arxiv.org/abs/2304.08069 - Code: https://github.com/lyuwenyu/RT-DETR **Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation** - Paper: https://arxiv.org/abs/2312.01220 - Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation **YOLO-World: Real-Time Open-Vocabulary Object Detection** - Paper: https://arxiv.org/abs/2401.17270 - Code: https://github.com/AILab-CVC/YOLO-World **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement** - Paper: https://arxiv.org/abs/2403.16131 - Code: https://github.com/xiuqhou/Salience-DETR # 异常检测(Anomaly Detection) **Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection** - Paper: https://arxiv.org/abs/2310.12790 - Code: https://github.com/mala-lab/AHL # 目标跟踪(Object Tracking) **Delving into the Trajectory Long-tail Distribution for Muti-object Tracking** - Paper: https://arxiv.org/abs/2403.04700 - Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT # 语义分割(Semantic Segmentation) **Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation** - Paper: https://arxiv.org/abs/2312.04265 - Code: https://github.com/w1oves/Rein **SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation** - Paper: https://arxiv.org/abs/2311.15537 - Code: https://github.com/xb534/SED # 医学图像(Medical Image) **Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology** - Paper: https://arxiv.org/abs/2402.17228 - Code: https://github.com/DearCaat/RRT-MIL **VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis** - Paper: https://arxiv.org/abs/2402.17300 - Code: https://github.com/Luffy03/VoCo **ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images** - Paper: https://arxiv.org/abs/2311.15264 - Code: https://github.com/nicoboou/chada_vit # 医学图像分割(Medical Image Segmentation) # 自动驾驶(Autonomous Driving) **UniPAD: A Universal Pre-training Paradigm for Autonomous Driving** - Paper: https://arxiv.org/abs/2310.08370 - Code: https://github.com/Nightmare-n/UniPAD **Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications** - Paper: https://arxiv.org/abs/2311.17663 - Code: https://github.com/haomo-ai/Cam4DOcc **Memory-based Adapters for Online 3D Scene Perception** - Paper: https://arxiv.org/abs/2403.06974 - Code: https://github.com/xuxw98/Online3D **Symphonize 3D Semantic Scene Completion with Contextual Instance Queries** - Paper: https://arxiv.org/abs/2306.15670 - Code: https://github.com/hustvl/Symphonies **A Real-world Large-scale Dataset for Roadside Cooperative Perception** - Paper: https://arxiv.org/abs/2403.10145 - Code: https://github.com/AIR-THU/DAIR-RCooper **Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving** - Paper: https://arxiv.org/abs/2403.07535 - Code: https://github.com/Junda24/AFNet **Traffic Scene Parsing through the TSP6K Dataset** - Paper: https://arxiv.org/pdf/2303.02835.pdf - Code: https://github.com/PengtaoJiang/TSP6K # 3D点云(3D-Point-Cloud) # 3D目标检测(3D Object Detection) **PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection** - Paper: https://arxiv.org/abs/2312.08371 - Code: https://github.com/kuanchihhuang/PTT **UniMODE: Unified Monocular 3D Object Detection** - Paper: https://arxiv.org/abs/2402.18573 # 3D语义分割(3D Semantic Segmentation) # 图像编辑(Image Editing) **Edit One for All: Interactive Batch Image Editing** - Homepage: https://thaoshibe.github.io/edit-one-for-all - Paper: https://arxiv.org/abs/2401.10219 - Code: https://github.com/thaoshibe/edit-one-for-all # 视频编辑(Video Editing) **MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers** - Homepage: [https://maskint.github.io](https://maskint.github.io/) - Paper: https://arxiv.org/abs/2312.12468 # Low-level Vision **Residual Denoising Diffusion Models** - Paper: https://arxiv.org/abs/2308.13712 - Code: https://github.com/nachifur/RDDM **Boosting Image Restoration via Priors from Pre-trained Models** - Paper: https://arxiv.org/abs/2403.06793 # 超分辨率(Super-Resolution) **SeD: Semantic-Aware Discriminator for Image Super-Resolution** - Paper: https://arxiv.org/abs/2402.19387 - Code: https://github.com/lbc12345/SeD **APISR: Anime Production Inspired Real-World Anime Super-Resolution** - Paper: https://arxiv.org/abs/2403.01598 - Code: https://github.com/Kiteretsu77/APISR # 去噪(Denoising) ## 图像去噪(Image Denoising) # 3D人体姿态估计(3D Human Pose Estimation) **Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation** - Paper: https://arxiv.org/abs/2311.12028 - Code: https://github.com/NationalGAILab/HoT # 图像生成(Image Generation) **InstanceDiffusion: Instance-level Control for Image Generation** - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/ - Paper: https://arxiv.org/abs/2402.03290 - Code: https://github.com/frank-xwang/InstanceDiffusion **ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations** - Homepage: https://eclipse-t2i.vercel.app/ - Paper: https://arxiv.org/abs/2312.04655 - Code: https://github.com/eclipse-t2i/eclipse-inference **Instruct-Imagen: Image Generation with Multi-modal Instruction** - Paper: https://arxiv.org/abs/2401.01952 **Residual Denoising Diffusion Models** - Paper: https://arxiv.org/abs/2308.13712 - Code: https://github.com/nachifur/RDDM **UniGS: Unified Representation for Image Generation and Segmentation** - Paper: https://arxiv.org/abs/2312.01985 **Multi-Instance Generation Controller for Text-to-Image Synthesis** - Paper: https://arxiv.org/abs/2402.05408 - Code: https://github.com/limuloo/migc **SVGDreamer: Text Guided SVG Generation with Diffusion Model** - Paper: https://arxiv.org/abs/2312.16476 - Code: https://ximinng.github.io/SVGDreamer-project/ **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model** - Paper: https://arxiv.org/abs/2312.05849 - Code: https://github.com/jiuntian/interactdiffusion **Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following** - Paper: https://arxiv.org/abs/2311.17002 - Code: https://github.com/ali-vilab/Ranni # 视频生成(Video Generation) **Vlogger: Make Your Dream A Vlog** - Paper: https://arxiv.org/abs/2401.09414 - Code: https://github.com/Vchitect/Vlogger **VBench: Comprehensive Benchmark Suite for Video Generative Models** - Homepage: https://vchitect.github.io/VBench-project/ - Paper: https://arxiv.org/abs/2311.17982 - Code: https://github.com/Vchitect/VBench **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models** - Homeoage: https://video-motion-customization.github.io/ - Paper: https://arxiv.org/abs/2312.00845 - Code: https://github.com/HyeonHo99/Video-Motion-Customization # 3D生成 **CityDreamer: Compositional Generative Model of Unbounded 3D Cities** - Homepage: https://haozhexie.com/project/city-dreamer/ - Paper: https://arxiv.org/abs/2309.00610 - Code: https://github.com/hzxie/city-dreamer **LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching** - Paper: https://arxiv.org/abs/2311.11284 - Code: https://github.com/EnVision-Research/LucidDreamer # 视频理解(Video Understanding) **MVBench: A Comprehensive Multi-modal Video Understanding Benchmark** - Paper: https://arxiv.org/abs/2311.17005 - Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 # 知识蒸馏(Knowledge Distillation) **Logit Standardization in Knowledge Distillation** - Paper: https://arxiv.org/abs/2403.01427 - Code: https://github.com/sunshangquan/logit-standardization-KD **Efficient Dataset Distillation via Minimax Diffusion** - Paper: https://arxiv.org/abs/2311.15529 - Code: https://github.com/vimar-gu/MinimaxDiffusion # 立体匹配(Stereo Matching) **Neural Markov Random Field for Stereo Matching** - Paper: https://arxiv.org/abs/2403.11193 - Code: https://github.com/aeolusguan/NMRF # 场景图生成(Scene Graph Generation) **HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation** - Homepage: https://zhangce01.github.io/HiKER-SGG/ - Paper : https://arxiv.org/abs/2403.12033 - Code: https://github.com/zhangce01/HiKER-SGG # 视频质量评价(Video Quality Assessment) **KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos** - Homepage: https://lixinustc.github.io/projects/KVQ/ - Paper: https://arxiv.org/abs/2402.07220 - Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024 # 数据集(Datasets) **A Real-world Large-scale Dataset for Roadside Cooperative Perception** - Paper: https://arxiv.org/abs/2403.10145 - Code: https://github.com/AIR-THU/DAIR-RCooper **Traffic Scene Parsing through the TSP6K Dataset** - Paper: https://arxiv.org/pdf/2303.02835.pdf - Code: https://github.com/PengtaoJiang/TSP6K # 其他(Others) **Object Recognition as Next Token Prediction** - Paper: https://arxiv.org/abs/2312.02142 - Code: https://github.com/kaiyuyue/nxtp **ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks** - Paper: https://arxiv.org/abs/2306.14525 - Code: https://parameternet.github.io/ **Seamless Human Motion Composition with Blended Positional Encodings** - Paper: https://arxiv.org/abs/2402.15509 - Code: https://github.com/BarqueroGerman/FlowMDM **LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning** - Homepage: https://ll3da.github.io/ - Paper: https://arxiv.org/abs/2311.18651 - Code: https://github.com/Open3DA/LL3DA **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update** - Homepage: https://clova-tool.github.io/ - Paper: https://arxiv.org/abs/2312.10908 **MoMask: Generative Masked Modeling of 3D Human Motions** - Paper: https://arxiv.org/abs/2312.00063 - Code: https://github.com/EricGuo5513/momask-codes **Amodal Ground Truth and Completion in the Wild** - Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/ - Paper: https://arxiv.org/abs/2312.17247 - Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild **Improved Visual Grounding through Self-Consistent Explanations** - Paper: https://arxiv.org/abs/2312.04554 - Code: https://github.com/uvavision/SelfEQ **ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object** - Homepage: https://chenshuang-zhang.github.io/imagenet_d/ - Paper: https://arxiv.org/abs/2403.18775 - Code: https://github.com/chenshuang-zhang/imagenet_d **Learning from Synthetic Human Group Activities** - Homepage: https://cjerry1243.github.io/M3Act/ - Paper https://arxiv.org/abs/2306.16772 - Code: https://github.com/cjerry1243/M3Act **A Cross-Subject Brain Decoding Framework** - Homepage: https://littlepure2333.github.io/MindBridge/ - Paper: https://arxiv.org/abs/2404.07850 - Code: https://github.com/littlepure2333/MindBridge **Multi-Task Dense Prediction via Mixture of Low-Rank Experts** - Paper : https://arxiv.org/abs/2403.17749 - Code: https://github.com/YuqiYang213/MLoRE **Contrastive Mean-Shift Learning for Generalized Category Discovery** - Homepage: https://postech-cvlab.github.io/cms/ - Paper: https://arxiv.org/abs/2404.09451 - Code: https://github.com/sua-choi/CMS ================================================ FILE: CVPR2025-Papers-with-Code.md ================================================ # CVPR 2025 论文和开源项目合集(Papers with Code) CVPR 2025 decisions are now available on OpenReview!22.1% = 2878 / 13008 > 注1:欢迎各位大佬提交issue,分享CVPR 2025论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code) > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) > - [CVPR 2024](CVPR2024-Papers-with-Code.md) 欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2025等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来! ![](CVer学术交流群.png) # 【CVPR 2025 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Agent)](#Agent) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP)EVOS - [Mamba](#Mamba) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [3D Visual Grounding(3D视觉定位)](#3DVG) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [具身智能(Embodied AI)](#Embodied) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [暗光图像增强(Low-light Image Enhancement)](#Low-light) - [场景图生成(Scene Graph Generation)](#SGG) - [风格迁移(Style Transfer)](#ST) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [压缩感知(Compressive Sensing)](#CS) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) # 3DGS(Gaussian Splatting) # Agent **SpiritSight Agent: Advanced GUI Agent with One Look** - Paper: https://arxiv.org/abs/2503.03196 - Code: https://hzhiyuan.github.io/SpiritSight-Agent # Avatars # Backbone **Building Vision Models upon Heat Conduction** - Paper: https://arxiv.org/abs/2405.16555 - Code: https://github.com/MzeroMiko/vHeat **LSNet: See Large, Focus Small** - Paper: https://arxiv.org/abs/2503.23135 - Code: https://github.com/jameslahm/lsnet # CLIP # Mamba **MambaVision: A Hybrid Mamba-Transformer Vision Backbone** - Paper: https://arxiv.org/abs/2407.08083 - Code: https://github.com/NVlabs/MambaVision **MobileMamba: Lightweight Multi-Receptive Visual Mamba Network** - Paper: https://arxiv.org/abs/2411.15941 - Code: https://github.com/lewandofskee/MobileMamba **MambaIC: State Space Models for High-Performance Learned Image Compression** - Paper: https://arxiv.org/abs/2503.12461 - Code: https://arxiv.org/abs/2503.12461 # Embodied AI **CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos** - Project: https://ai4ce.github.io/CityWalker/ - Paper: https://arxiv.org/abs/2411.17820 - Code: https://github.com/ai4ce/CityWalker # GAN # OCR # NeRF # DETR **Mr. DETR: Instructive Multi-Route Training for Detection Transformers** - Paper: https://arxiv.org/abs/2412.10028 - Code: https://github.com/Visual-AI/Mr.DETR # Prompt # 多模态大语言模型(MLLM) **LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences** - Paper: https://arxiv.org/abs/2412.01292 - Code: https://github.com/Hoyyyaard/LSceneLLM **DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution** - Paper: https://arxiv.org/abs/2405.16071 - Code: https://github.com/callsys/DynRefer **Retrieval-Augmented Personalization for Multimodal Large Language Models** - Project Page: https://hoar012.github.io/RAP-Project/ - Paper: https://arxiv.org/abs/2410.13360 - Code: https://github.com/Hoar012/RAP-MLLM **BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models** - Paper: https://arxiv.org/abs/2411.15232 - Code: https://github.com/HealthX-Lab/BiomedCoOp **FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression** - Paper: https://arxiv.org/abs/2412.04317 - Code: https://github.com/codefanw/FlashSloth **MMRL: Multi-Modal Representation Learning for Vision-Language Models** - Paper: https://arxiv.org/abs/2503.08497 - Code: https://github.com/yunncheng/MMRL **PAVE: Patching and Adapting Video Large Language Models** - Paper: https://arxiv.org/abs/2503.19794 - Code: https://github.com/dragonlzm/PAVE **AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization** - Paper: https://arxiv.org/abs/2503.23733 - Code: https://github.com/THUNLP-MT/AdaMMS # 大语言模型(LLM) # NAS # ReID(重识别) **From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization** - Paper: https://arxiv.org/abs/2503.00938 - Code: https://github.com/yuanc3/Pose2ID **AirRoom: Objects Matter in Room Reidentification** - Project: https://sairlab.org/airroom/ - Paper: https://arxiv.org/abs/2503.01130 **IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification** - Paper: https://arxiv.org/abs/2503.10324 - Code: https://github.com/924973292/IDEA # 扩散模型(Diffusion Models) **TinyFusion: Diffusion Transformers Learned Shallow** - Paper: https://arxiv.org/abs/2412.01199 - Code: https://github.com/VainF/TinyFusion **DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture** - Paper: https://arxiv.org/abs/2409.03550 - Code: https://github.com/qianlong0502/DKDM **Tiled Diffusion** - Homepage: https://madaror.github.io/tiled-diffusion.github.io/ - Paper: https://arxiv.org/abs/2412.15185 - Code: https://github.com/madaror/tiled-diffusion # Vision Transformer # 视觉和语言(Vision-Language) **NLPrompt: Noise-Label Prompt Learning for Vision-Language Models** - Paper: https://arxiv.org/abs/2412.01256 - Code: https://github.com/qunovo/NLPrompt **PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability** - Paper: https://arxiv.org/abs/2503.08481 - Code: https://github.com/unira-zwj/PhysVLM **MMRL: Multi-Modal Representation Learning for Vision-Language Models** - Paper: https://arxiv.org/abs/2503.08497 - Code: https://github.com/yunncheng/MMRL # 目标检测(Object Detection) **LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models** - Paper: https://arxiv.org/abs/2501.18954 - Code:https://github.com/iSEE-Laboratory/LLMDet **Mr. DETR: Instructive Multi-Route Training for Detection Transformers** - Paper: https://arxiv.org/abs/2412.10028 - Code: https://github.com/Visual-AI/Mr.DETR # 异常检测(Anomaly Detection) # 目标跟踪(Object Tracking) **Multiple Object Tracking as ID Prediction** - Paper:https://arxiv.org/abs/2403.16848 - Code: https://github.com/MCG-NJU/MOTIP **Omnidirectional Multi-Object Tracking** - Paper:https://arxiv.org/abs/2503.04565 - Code:https://github.com/xifen523/OmniTrack # 医学图像(Medical Image) **BrainMVP: Multi-modal Vision Pre-training for Medical Image Analysis** - Paper: https://arxiv.org/abs/2410.10604 - Code: https://github.com/shaohao011/BrainMVP # 医学图像分割(Medical Image Segmentation) **Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation** - Paper: https://arxiv.org/abs/2503.13012 - Code: https://github.com/Yore0/TTDG-MGM # 自动驾驶(Autonomous Driving) **LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes** - Project: https://ldkong.com/LiMoE - Paper: https://arxiv.org/abs/2501.04004 - Code: https://github.com/Xiangxu-0103/LiMoE # 3D点云(3D-Point-Cloud) **Unlocking Generalization Power in LiDAR Point Cloud Registration** - Paper: https://arxiv.org/abs/2503.10149 - Code: https://github.com/peakpang/UGP # 3D目标检测(3D Object Detection) # 3D语义分割(3D Semantic Segmentation) # Low-level Vision # 超分辨率(Super-Resolution) **AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution** - Paper: https://arxiv.org/abs/2412.00124 - Code: https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution # 去噪(Denoising) ## 图像去噪(Image Denoising) # 3D人体姿态估计(3D Human Pose Estimation) **Reconstructing Humans with a Biomechanically Accurate Skeleton** - Homepage: https://isshikihugh.github.io/HSMR/ - Code: https://github.com/IsshikiHugh/HSMR #3D Visual Grounding(3D视觉定位) **ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding** - Homepage: https://pqh22.github.io/projects/ProxyTransformation/index.html - Code: https://github.com/pqh22/ProxyTransformation - Paper: https://arxiv.org/abs/2502.19247 # 图像生成(Image Generation) **Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models** - Paper: https://arxiv.org/abs/2501.01423 - Code: https://github.com/hustvl/LightningDiT **SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models** - Paper: https://arxiv.org/abs/2412.04852 - Code: https://github.com/taco-group/SleeperMark **TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation** - Homepage: https://byteflow-ai.github.io/TokenFlow/ - Code: https://github.com/ByteFlow-AI/TokenFlow - Paper:https://arxiv.org/abs/2412.03069 **PAR: Parallelized Autoregressive Visual Generation** - Project: https://epiphqny.github.io/PAR-project/ - Paper: https://arxiv.org/abs/2412.15119 - Code: https://github.com/Epiphqny/PAR **Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis** - Project: https://generative-photography.github.io/project/ - Paper: https://arxiv.org/abs/2412.02168 - Code: https://github.com/pandayuanyu/generative-photography **OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation** - Project Page: https://opening-benchmark.github.io/ - Paper: https://arxiv.org/abs/2411.18499). - Code: https://github.com/LanceZPF/OpenING # 视频生成(Video Generation) **Identity-Preserving Text-to-Video Generation by Frequency Decomposition** - Paper: https://arxiv.org/abs/2411.17440 - Code: https://github.com/PKU-YuanGroup/ConsisID **Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models** - Paper: https://arxiv.org/abs/2407.15642 - Code: https://github.com/maxin-cn/Cinemo **X-Dyna: Expressive Dynamic Human Image Animation** - Paper: https://arxiv.org/abs/2501.10021 - Code: https://github.com/bytedance/X-Dyna **PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation** - Paper: https://arxiv.org/pdf/2412.00596 - Code: https://github.com/pittisl/PhyT2V **Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model** - Project: https://liewfeng.github.io/TeaCache/ - Paper: https://arxiv.org/abs/2411.19108 - Code: https://github.com/ali-vilab/TeaCache **AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion** - Project: https://iva-mzsun.github.io/AR-Diffusion - Paper: https://arxiv.org/abs/2503.07418 - Code: https://github.com/iva-mzsun/AR-Diffusion # 图像编辑(Image Editing) **Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing** - Paper: https://arxiv.org/abs/2411.16832 - Code: https://github.com/taco-group/FaceLock **h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform** - Paper: https://arxiv.org/abs/2503.02187 - Code: https://github.com/nktoan/h-edit # 视频编辑(Video Editing) # 3D生成(3D Generation) **Generative Gaussian Splatting for Unbounded 3D City Generation** - Project: https://haozhexie.com/project/gaussian-city - Paper: https://arxiv.org/abs/2406.06526 - Code: https://github.com/hzxie/GaussianCity **StdGEN: Semantic-Decomposed 3D Character Generation from Single Images** - Project: https://stdgen.github.io/ - Paper: https://arxiv.org/abs/2411.05738 - Code: https://github.com/hyz317/StdGEN # 3D重建(3D Reconstruction) **Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass** - Project: https://fast3r-3d.github.io/ - Paper: https://arxiv.org/abs/2501.13928 # 人体运动生成(Human Motion Generation) **SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance** - Project: https://4dvlab.github.io/project_page/semgeomo/ - Paper: https://arxiv.org/abs/2503.01291 - https://github.com/4DVLab/SemGeoMo # 视频理解(Video Understanding) **Temporal Grounding Videos like Flipping Manga** - Paper: https://arxiv.org/abs/2411.10332 - Code: https://github.com/yongliang-wu/NumPro # 具身智能(Embodied AI) **Universal Actions for Enhanced Embodied Foundation Models** - Project: https://2toinf.github.io/UniAct/ - Paper: https://arxiv.org/abs/2501.10105 - Code: https://github.com/2toinf/UniAct **PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability** - Paper: https://arxiv.org/abs/2503.08481 - Code: https://github.com/unira-zwj/PhysVLM # 知识蒸馏(Knowledge Distillation) # 深度估计(Depth Estimation) **DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos** - Project: https://depthcrafter.github.io - Paper: https://arxiv.org/abs/2409.02095 - Code: https://github.com/Tencent/DepthCrafter **MonSter: Marry Monodepth to Stereo Unleashes Power** - Paper: https://arxiv.org/abs/2501.08643 - Code: https://github.com/Junda24/MonSter **DEFOM-Stereo: Depth Foundation Model Based Stereo Matching** - Project: https://insta360-research-team.github.io/DEFOM-Stereo/ - Paper: https://arxiv.org/abs/2501.09466 - Code: https://github.com/Insta360-Research-Team/DEFOM-Stereo # 立体匹配(Stereo Matching) **MonSter: Marry Monodepth to Stereo Unleashes Power** - Paper: https://arxiv.org/abs/2501.08643 - Code: https://github.com/Junda24/MonSter # 暗光图像增强(Low-light Image Enhancement) **HVI: A New color space for Low-light Image Enhancement** - Paper: https://arxiv.org/abs/2502.20272 - Code: https://github.com/Fediory/HVI-CIDNet - Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_ **ReDDiT: Efficient Diffusion as Low Light Enhancer** - Paper: https://arxiv.org/abs/2410.12346 - Code: https://github.com/lgz-0713/ReDDiT # 图像压缩(Image Compression)](#IC) **MambaIC: State Space Models for High-Performance Learned Image Compression** - Paper: https://arxiv.org/abs/2503.12461 - Code: https://arxiv.org/abs/2503.12461 # 场景图生成(Scene Graph Generation) # 风格迁移(Style Transfer) **StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements** - Project: https://stylestudio-official.github.io/ - Paper: https://arxiv.org/abs/2412.08503 - Code: https://github.com/Westlake-AGI-Lab/StyleStudio # 图像质量评价(Image Quality Assessment) **Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language** - Homepage: https://yichengchen24.github.io/projects/autocherrypicker - Paper: https://arxiv.org/pdf/2406.20085 - Code: https://github.com/yichengchen24/ACP # 视频质量评价(Video Quality Assessment) # 压缩感知(Compressive Sensing) **Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing** - Paper: https://arxiv.org/abs/2503.08429 - Code: https://github.com/FengodChen/DMP-DUN-CVPR2025 # 数据集(Datasets) **Objaverse++: Curated 3D Object Dataset with Quality Annotations** - Paper: https://arxiv.org/abs/2504.07334 - Code: https://github.com/TCXX/ObjaversePlusPlus # 其他(Others) **DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry** - Paper: https://arxiv.org/abs/2503.13110 - Code: https://github.com/jinli99/DTGBrepGen **Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation** - Paper: https://arxiv.org/abs/2503.19307 - Code: https://github.com/delaprada/HandSynthesis.git **EVOS: Efficient Implicit Neural Training via EVOlutionary Selector** - Homepage: https://weixiang-zhang.github.io/proj-evos/ - Paper: https://arxiv.org/abs/2412.10153 - Code: https://github.com/zwx-open/EVOS-INR ================================================ FILE: README.md ================================================ # CVPR 2026 论文和开源项目合集(Papers with Code) CVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092 > 注1:欢迎各位大佬提交issue,分享CVPR 2026论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code) > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) 欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2026等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来! ![](CVer学术交流群.png) # 【CVPR 2026 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Agent)](#Agent) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [Mamba](#Mamba) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [具身智能(Embodied AI)](#Embodied) - [空间智能(Spatial Intelligence](#SI) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [3D Visual Grounding(3D视觉定位)](#3DVG) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [遥感(Remote)](#Remote) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [视频压缩(Video Compression)](#VC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [暗光图像增强(Low-light Image Enhancement)](#Low-light) - [场景图生成(Scene Graph Generation)](#SGG) - [图像检索(Image Retrieval)](#Image-Retrieval) - [风格迁移(Style Transfer)](#ST) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [压缩感知(Compressive Sensing)](#CS) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) # 3DGS(Gaussian Splatting) **Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting** - Paper: https://arxiv.org/abs/2602.20933 - Code: - Project: https://sk-fun.fun/DropAnSH-GS **Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking** - Paper: https://arxiv.org/abs/2512.01329 - Project: https://haza628.github.io/tagSplat/ **FastGS: Training 3D Gaussian Splatting in 100 Seconds** - Paper: https://arxiv.org/pdf/2511.04283 - Code: https://github.com/fastgs/FastGS - Project: https://fastgs.github.io/ # Agent # Avatars # Backbone # CLIP # Mamba # GAN # OCR # NeRF # DETR # Prompt # 多模态大语言模型(MLLM) **Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking** - Paper: https://arxiv.org/abs/2602.20330 - Code: https://github.com/UIUC-MONET/vlm-circuit-tracing **UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark** - Paper: https://arxiv.org/abs/2603.05075 - Code: - Project: https://any2any-mllm.github.io/unim/ # 大语言模型(LLM) # 具身智能(Embodied AI) **Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI** - Paper: https://arxiv.org/abs/2511.20620 - Code: https://github.com/ai4ce/wanderland - Project: https://ai4ce.github.io/wanderland/ # 空间智能(Spatial Intelligence) **Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning** - Paper: https://arxiv.org/abs/2510.27606 - Code: https://github.com/InternLM/Spatial-SSRL - Model: https://huggingface.co/internlm/Spatial-SSRL-7B # NAS # ReID(重识别) **MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification** - Paper: https://arxiv.org/abs/2512.03404 - Code: https://github.com/yjzhao1019/MOS # 扩散模型(Diffusion Models) # Vision Transformer # 视觉和语言(Vision-Language) **StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues** - Paper: https://arxiv.org/abs/2602.20089 - Code: https://github.com/intelligolabs/StructXLIP **ApET: Approximation-Error Guided Token Compression for Efficient VLMs** - Paper: https://arxiv.org/abs/2602.19870 - Code: https://github.com/MaQianKun0/ApET **Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking** - Paper: https://arxiv.org/abs/2602.20330 - Code: https://github.com/UIUC-MONET/vlm-circuit-tracing # 目标检测(Object Detection) # 异常检测(Anomaly Detection) # 目标跟踪(Object Tracking) # 医学图像(Medical Image) # 医学图像分割(Medical Image Segmentation) **MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation** - Paper: https://arxiv.org/abs/2602.20423 - Code: https://github.com/HealthX-Lab/MedCLIPSeg - Project: https://tahakoleilat.github.io/MedCLIPSeg # 自动驾驶(Autonomous Driving) **Open-Vocabulary Domain Generalization in Urban-Scene Segmentation** - Paper: https://arxiv.org/pdf/2602.18853 - Code: https://github.com/DZhaoXd/s2_corr **U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences** - Paper: https://arxiv.org/abs/2512.02982 - Code: https://github.com/worldbench/U4D # 3D点云(3D-Point-Cloud) **CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation** - Paper: https://arxiv.org/abs/2602.20409 - Code: https://github.com/SarthakM320/CLIPoint3D # 3D目标检测(3D Object Detection) # 3D语义分割(3D Semantic Segmentation) # Low-level Vision # 超分辨率(Super-Resolution) # 去噪(Denoising) ## 图像去噪(Image Denoising) # 3D人体姿态估计(3D Human Pose Estimation) #3D Visual Grounding(3D视觉定位) # 图像生成(Image Generation) ExpPortrait: Expressive Portrait Generation via Personalized Representation - Paper: https://arxiv.org/abs/2602.19900 - Code: # 视频生成(Video Generation) # 图像编辑(Image Editing) # 视频编辑(Video Editing) # 3D生成(3D Generation) # 3D重建(3D Reconstruction) **tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction** - Project: https://cwchenwang.github.io/tttLRM/ - Paper: https://arxiv.org/abs/2602.20160 - Code: https://github.com/cwchenwang/tttLRM **Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning** - Project: https://flow3r-project.github.io/ - Paper: https://arxiv.org/abs/2602.20157 - Code: https://github.com/Kidrauh/flow3r **RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing** - Paper: https://arxiv.org/abs/2602.19753 - Code: https://github.com/yyyykf/RAP # 人体运动生成(Human Motion Generation) # 视频理解(Video Understanding) # 遥感(Remote) Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation - Paper: https://arxiv.org/abs/2602.19863 - Code: None # 知识蒸馏(Knowledge Distillation) # 深度估计(Depth Estimation) # 立体匹配(Stereo Matching) # 暗光图像增强(Low-light Image Enhancement) # 图像压缩(Image Compression)](#IC) # 视频压缩(Video Compression)](#VC) **UniComp: Rethinking Video Compression Through Informational Uniqueness** - Paper: https://arxiv.org/abs/2512.03575 - Code: https://github.com/TimeMarker-LLM/UniComp # 场景图生成(Scene Graph Generation) # 图像检索(Image Retrieval) **PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing ** - Paper: https://arxiv.org/abs/2603.04598 - Code: # 风格迁移(Style Transfer) # 图像质量评价(Image Quality Assessment) # 视频质量评价(Video Quality Assessment) # 压缩感知(Compressive Sensing) # 数据集(Datasets) # 其他(Others) **Decoupling Defense Strategies for Robust Image Watermarking** - Paper: https://arxiv.org/abs/2602.20053 - Code: None **Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery** - Paper: https://arxiv.org/abs/2602.19910 - Code: **The Invisible Gorilla Effect in Out-of-distribution Detection** - Paper: https://arxiv.org/abs/2602.20068 - Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect **SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images** - Paper: https://arxiv.org/abs/2602.20412 - Code: **RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces** - Paper: https://arxiv.org/abs/2602.20618 - Code: **Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models** - Paper: - Code: **GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement** - Paper: https://arxiv.org/abs/2603.05095 - Code: **FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation** - Paper: https://arxiv.org/abs/2603.04733 - Code: https://github.com/eVI-group-SCU/FOZO **Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning ** - Paper: https://arxiv.org/abs/2603.04825 - Code: https://github.com/RyanZhaoIc/CAD ================================================ FILE: master ================================================