[
  {
    "path": "CVPR2019-Papers-with-Code.md",
    "content": "# CVPR2019-Code\n\nCVPR 2019 论文开源项目合集\n\n传送门：[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)\n\n附：[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv)\n\n- [目标检测](#Object-Detection)\n- [目标跟踪](#Object-Tracking)\n- [语义分割](#Semantic-Segmentation)\n- [实例分割](#Instance-Segmentation)\n- [GAN](#GAN)\n- [人脸检测](#Face-Detection)\n- [人体姿态估计](#Human-Pose-Estimation)\n- [6DoF 姿态估计](#6DoF-Pose-Estimation)\n- [头部姿态估计](#Head-Pose-Estimation)\n- [人群密度估计](#Crowd-Counting)\n\n**更新记录：**\n\n- 20200226：添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)\n\n- 20191026：添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv)\n- 20190405：添加 8 篇论文（目标检测、语义分割等方向）\n- 20190408：添加 6 篇论文（目标跟踪、GAN、6DoF姿态估计等方向）\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测\n\n**Bounding Box Regression with Uncertainty for Accurate Object Detection**\n\n- arXiv：<https://arxiv.org/abs/1809.08545>\n\n- github：<https://github.com/yihui-he/KL-Loss>\n\n<a name=\"Object-Tracking\"></a>\n\n# 目标跟踪\n\n**Fast Online Object Tracking and Segmentation: A Unifying Approach**\n\n- arXiv：<https://arxiv.org/abs/1812.05050>\n\n- github：<https://github.com/foolwood/SiamMask>\n\n- homepage：<http://www.robots.ox.ac.uk/~qwang/SiamMask>\n\n**Unsupervised Deep Tracking**\n\n- arXiv：<https://arxiv.org/abs/1904.01828>\n\n- github：<https://github.com/594422814/UDT>\n\n- github(PyTorch)：<https://github.com/594422814/UDT_pytorch>\n\n**Target-Aware Deep Tracking**\n\n- arXiv：<https://arxiv.org/abs/1904.01772>\n\n- homepage：<https://xinli-zn.github.io/TADT-project-page/>\n\n<a name=\"Semantic-Segmentation\"></a>\n\n# 语义分割\n\n**Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation**\n\n- arXiv：<https://arxiv.org/abs/1903.02120>\n\n- github：[https://github.com/LinZhuoChen/DUpsampling（非官方）](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89)\n\n**Dual Attention Network for Scene Segmentation**\n\n- arXiv：<https://arxiv.org/abs/1809.02983>\n\n- github：<https://github.com/junfu1115/DANet>\n\n**Collaborative Global-Local Networks for Memory-Efﬁcient Segmentation of Ultra-High Resolution Images**\n\n- arXiv：None\n\n- github：<https://github.com/chenwydj/ultra_high_resolution_segmentation>\n\n<a name=\"Instance-Segmentation\"></a>\n\n# 实例分割\n\n**Mask Scoring R-CNN**\n\n- arXiv：<https://arxiv.org/abs/1903.00241>\n\n- github：<https://github.com/zjhuang22/maskscoring_rcnn>\n\n<a name=\"GAN\"></a>\n\n# GAN\n\n**Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis**\n\n- arXiv：<https://arxiv.org/abs/1903.05628>\n- github：<https://github.com/HelenMao/MSGAN>\n\n<a name=\"Face-Detection\"></a>\n\n# 人脸检测\n\n**DSFD: Dual Shot Face Detector**\n\n- arXiv：<https://arxiv.org/abs/1810.10220>\n\n- github：<https://github.com/TencentYoutuResearch/FaceDetection-DSFD>\n\n<a name=\"Human-Pose-Estimation\"></a>\n\n# 人体姿态估计\n\n**Deep High-Resolution Representation Learning for Human Pose Estimation**\n\n- arXiv：<https://arxiv.org/abs/1902.09212>\n\n- github：<https://github.com/leoxiaobin/deep-high-resolution-net.pytorch>\n\n<a name=\"6DoF-Pose-Estimation\"></a>\n\n# 6DoF姿态估计\n\n**PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**\n\n- arXiv：<https://arxiv.org/abs/1812.11788>\n- github：<https://github.com/zju3dv/pvnet>\n\n<a name=\"Head-Pose-Estimation\"></a>\n\n# 头部姿态估计\n\n**PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**\n\n- paper：<https://github.com/shamangary/FSA-Net/blob/master/0191.pdf>\n- github：<https://github.com/shamangary/FSA-Net>\n\n<a name=\"Crowd-Counting\"></a>\n\n# 人群密度估计\n\n**Learning from Synthetic Data for Crowd Counting in the Wild**\n\n- arXiv：<https://arxiv.org/abs/1903.03303>\n- github：<https://github.com/gjy3035/GCC-SFCN>\n- homepage：<https://gjy3035.github.io/GCC-CL/>"
  },
  {
    "path": "CVPR2020-Papers-with-Code.md",
    "content": "# CVPR2020-Code\n\n[CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集，同时欢迎各位大佬提交issue，分享CVPR 2020开源项目\n\n**【推荐阅读】**\n\n- [CVPR 2020 virtual](http://cvpr20.com/)\n- ECCV 2020 论文开源项目合集来了：https://github.com/amusi/ECCV2020-Code\n\n- 关于往年CV顶会论文（如ECCV 2020、CVPR 2019、ICCV 2019）以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n\n**【CVPR 2020 论文开源目录】**\n\n- [CNN](#CNN)\n- [图像分类](#Image-Classification)\n- [视频分类](#Video-Classification)\n- [目标检测](#Object-Detection)\n- [3D目标检测](#3D-Object-Detection)\n- [视频目标检测](#Video-Object-Detection)\n- [目标跟踪](#Object-Tracking)\n- [语义分割](#Semantic-Segmentation)\n- [实例分割](#Instance-Segmentation)\n- [全景分割](#Panoptic-Segmentation)\n- [视频目标分割](#VOS)\n- [超像素分割](#Superpixel)\n- [交互式图像分割](#IIS)\n- [NAS](#NAS)\n- [GAN](#GAN)\n- [Re-ID](#Re-ID)\n- [3D点云（分类/分割/配准/跟踪等）](#3D-PointCloud)\n- [人脸（识别/检测/重建等）](#Face)\n- [人体姿态估计(2D/3D)](#Human-Pose-Estimation)\n- [人体解析](#Human-Parsing)\n- [场景文本检测](#Scene-Text-Detection)\n- [场景文本识别](#Scene-Text-Recognition)\n- [特征(点)检测和描述](#Feature)\n- [超分辨率](#Super-Resolution)\n- [模型压缩/剪枝](#Model-Compression)\n- [视频理解/行为识别](#Action-Recognition)\n- [人群计数](#Crowd-Counting)\n- [深度估计](#Depth-Estimation)\n- [6D目标姿态估计](#6DOF)\n- [手势估计](#Hand-Pose)\n- [显著性检测](#Saliency)\n- [去噪](#Denoising)\n- [去雨](#Deraining)\n- [去模糊](#Deblurring)\n- [去雾](#Dehazing)\n- [特征点检测与描述](#Feature)\n- [视觉问答(VQA)](#VQA)\n- [视频问答(VideoQA)](#VideoQA)\n- [视觉语言导航](#VLN)\n- [视频压缩](#Video-Compression)\n- [视频插帧](#Video-Frame-Interpolation)\n- [风格迁移](#Style-Transfer)\n- [车道线检测](#Lane-Detection)\n- [\"人-物\"交互(HOI)检测](#HOI)\n- [轨迹预测](#TP)\n- [运动预测](#Motion-Predication)\n- [光流估计](#OF)\n- [图像检索](#IR)\n- [虚拟试衣](#Virtual-Try-On)\n- [HDR](#HDR)\n- [对抗样本](#AE)\n- [三维重建](#3D-Reconstructing)\n- [深度补全](#DC)\n- [语义场景补全](#SSC)\n- [图像/视频描述](#Captioning)\n- [线框解析](#WP)\n- [数据集](#Datasets)\n- [其他](#Others)\n- [不确定中没中](#Not-Sure)\n\n<a name=\"CNN\"></a>\n\n# CNN\n\n**Exploring Self-attention for Image Recognition**\n\n- 论文：https://hszhao.github.io/papers/cvpr20_san.pdf\n\n- 代码：https://github.com/hszhao/SAN\n\n**Improving Convolutional Networks with Self-Calibrated Convolutions**\n\n- 主页：https://mmcheng.net/scconv/\n\n- 论文：http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf\n- 代码：https://github.com/backseason/SCNet\n\n**Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets**\n\n- 论文：https://arxiv.org/abs/2003.13549\n- 代码：https://github.com/zeiss-microscopy/BSConv\n\n<a name=\"Image-Classification\"></a>\n\n# 图像分类\n\n**Interpretable and Accurate Fine-grained Recognition via Region Grouping**\n\n- 论文：https://arxiv.org/abs/2005.10411\n\n- 代码：https://github.com/zxhuang1698/interpretability-by-parts\n\n**Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion**\n\n- 论文：https://arxiv.org/abs/2003.04490\n\n- 代码：https://github.com/AdamKortylewski/CompositionalNets\n\n**Spatially Attentive Output Layer for Image Classification**\n\n- 论文：https://arxiv.org/abs/2004.07570 \n- 代码（好像被原作者删除了）：https://github.com/ildoonet/spatially-attentive-output-layer \n\n<a name=\"Video-Classification\"></a>\n\n# 视频分类\n\n**SmallBigNet: Integrating Core and Contextual Views for Video Classification**\n\n- 论文：https://arxiv.org/abs/2006.14582\n- 代码：https://github.com/xhl-video/SmallBigNet\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测\n\n**Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf\n- 代码：https://github.com/FishYuLi/BalancedGroupSoftmax\n\n**AugFPN: Improving Multi-scale Feature Learning for Object Detection**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf \n- 代码：https://github.com/Gus-Guo/AugFPN\n\n**Noise-Aware Fully Webly Supervised Object Detection**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html\n- 代码：https://github.com/shenyunhang/NA-fWebSOD/\n\n**Learning a Unified Sample Weighting Network for Object Detection**\n\n- 论文：https://arxiv.org/abs/2006.06568\n- 代码：https://github.com/caiqi/sample-weighting-network\n\n**D2Det: Towards High Quality Object Detection and Instance Segmentation**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf\n\n- 代码：https://github.com/JialeCao001/D2Det\n\n**Dynamic Refinement Network for Oriented and Densely Packed Object Detection**\n\n- 论文下载链接：https://arxiv.org/abs/2005.09973\n\n- 代码和数据集：https://github.com/Anymake/DRN_CVPR2020\n\n**Scale-Equalizing Pyramid Convolution for Object Detection**\n\n论文：https://arxiv.org/abs/2005.03101\n\n代码：https://github.com/jshilong/SEPC\n\n**Revisiting the Sibling Head in Object Detector**\n\n- 论文：https://arxiv.org/abs/2003.07540\n\n- 代码：https://github.com/Sense-X/TSD \n\n**Scale-equalizing Pyramid Convolution for Object Detection**\n\n- 论文：暂无\n- 代码：https://github.com/jshilong/SEPC \n\n**Detection in Crowded Scenes: One Proposal, Multiple Predictions**\n\n- 论文：https://arxiv.org/abs/2003.09163\n- 代码：https://github.com/megvii-model/CrowdDetection\n\n**Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection**\n\n- 论文：https://arxiv.org/abs/2004.04725\n- 代码：https://github.com/NVlabs/wetectron\n\n**Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection**\n\n- 论文：https://arxiv.org/abs/1912.02424 \n- 代码：https://github.com/sfzhang15/ATSS\n\n**BiDet: An Efficient Binarized Object Detector**\n\n- 论文：https://arxiv.org/abs/2003.03961 \n- 代码：https://github.com/ZiweiWangTHU/BiDet\n\n**Harmonizing Transferability and Discriminability for Adapting Object Detectors**\n\n- 论文：https://arxiv.org/abs/2003.06297\n- 代码：https://github.com/chaoqichen/HTCN\n\n**CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection**\n\n- 论文：https://arxiv.org/abs/2003.09119\n- 代码：https://github.com/KiveeDong/CentripetalNet\n\n**Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection**\n\n- 论文：https://arxiv.org/abs/2003.11818\n- 代码：https://github.com/ggjy/HitDet.pytorch\n\n**EfficientDet: Scalable and Efficient Object Detection**\n\n- 论文：https://arxiv.org/abs/1911.09070\n- 代码：https://github.com/google/automl/tree/master/efficientdet \n\n<a name=\"3D-Object-Detection\"></a>\n\n# 3D目标检测\n\n**SESS: Self-Ensembling Semi-Supervised 3D Object Detection**\n\n- 论文： https://arxiv.org/abs/1912.11803\n\n- 代码：https://github.com/Na-Z/sess\n\n**Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection**\n\n- 论文： https://arxiv.org/abs/2006.04356\n\n- 代码：https://github.com/dleam/Associate-3Ddet\n\n**What You See is What You Get: Exploiting Visibility for 3D Object Detection**\n\n- 主页：https://www.cs.cmu.edu/~peiyunh/wysiwyg/\n\n- 论文：https://arxiv.org/abs/1912.04986\n- 代码：https://github.com/peiyunh/wysiwyg\n\n**Learning Depth-Guided Convolutions for Monocular 3D Object Detection**\n\n- 论文：https://arxiv.org/abs/1912.04799\n- 代码：https://github.com/dingmyu/D4LCN\n\n**Structure Aware Single-stage 3D Object Detection from Point Cloud**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html\n\n- 代码：https://github.com/skyhehe123/SA-SSD\n\n**IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf\n\n- 代码：https://github.com/swords123/IDA-3D\n\n**Train in Germany, Test in The USA: Making 3D Object Detectors Generalize**\n\n- 论文：https://arxiv.org/abs/2005.08139\n\n- 代码：https://github.com/cxy1997/3D_adapt_auto_driving\n\n**MLCVNet: Multi-Level Context VoteNet for 3D Object Detection**\n\n- 论文：https://arxiv.org/abs/2004.05679\n- 代码：https://github.com/NUAAXQ/MLCVNet\n\n**3DSSD: Point-based 3D Single Stage Object Detector**\n\n- CVPR 2020 Oral\n\n- 论文：https://arxiv.org/abs/2002.10187\n\n- 代码：https://github.com/tomztyang/3DSSD\n\n**Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation**\n\n- 论文：https://arxiv.org/abs/2004.03572\n\n- 代码：https://github.com/zju3dv/disprcn\n\n**End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection**\n\n- 论文：https://arxiv.org/abs/2004.03080\n\n- 代码：https://github.com/mileyan/pseudo-LiDAR_e2e\n\n**DSGN: Deep Stereo Geometry Network for 3D Object Detection**\n\n- 论文：https://arxiv.org/abs/2001.03398\n- 代码：https://github.com/chenyilun95/DSGN\n\n**LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention**\n\n- 论文：https://arxiv.org/abs/2004.01389\n- 代码：https://github.com/yinjunbo/3DVID\n\n**PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection**\n\n- 论文：https://arxiv.org/abs/1912.13192\n\n- 代码：https://github.com/sshaoshuai/PV-RCNN\n\n**Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud**\n\n- 论文：https://arxiv.org/abs/2003.01251 \n- 代码：https://github.com/WeijingShi/Point-GNN \n\n<a name=\"Video-Object-Detection\"></a>\n\n# 视频目标检测\n\n**Memory Enhanced Global-Local Aggregation for Video Object Detection**\n\n论文：https://arxiv.org/abs/2003.12063\n\n代码：https://github.com/Scalsol/mega.pytorch\n\n<a name=\"Object-Tracking\"></a>\n\n# 目标跟踪\n\n**SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking**\n\n- 论文：https://arxiv.org/abs/1911.07241\n- 代码：https://github.com/ohhhyeahhh/SiamCAR\n\n**D3S -- A Discriminative Single Shot Segmentation Tracker**\n\n- 论文：https://arxiv.org/abs/1911.08862\n- 代码：https://github.com/alanlukezic/d3s\n\n**ROAM: Recurrently Optimizing Tracking Model**\n\n- 论文：https://arxiv.org/abs/1907.12006\n\n- 代码：https://github.com/skyoung/ROAM\n\n**Siam R-CNN: Visual Tracking by Re-Detection**\n\n- 主页：https://www.vision.rwth-aachen.de/page/siamrcnn\n- 论文：https://arxiv.org/abs/1911.12836\n- 论文2：https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf\n- 代码：https://github.com/VisualComputingInstitute/SiamR-CNN\n\n**Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises**\n\n- 论文：https://arxiv.org/abs/2003.09595 \n- 代码：https://github.com/MasterBin-IIAU/CSA \n\n**High-Performance Long-Term Tracking with Meta-Updater**\n\n- 论文：https://arxiv.org/abs/2004.00305\n\n- 代码：https://github.com/Daikenan/LTMU\n\n**AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization**\n\n- 论文：https://arxiv.org/abs/2003.12949\n\n- 代码：https://github.com/vision4robotics/AutoTrack\n\n**Probabilistic Regression for Visual Tracking**\n\n- 论文：https://arxiv.org/abs/2003.12565\n- 代码：https://github.com/visionml/pytracking\n\n**MAST: A Memory-Augmented Self-supervised Tracker**\n\n- 论文：https://arxiv.org/abs/2002.07793\n- 代码：https://github.com/zlai0/MAST\n\n**Siamese Box Adaptive Network for Visual Tracking**\n\n- 论文：https://arxiv.org/abs/2003.06761\n- 代码：https://github.com/hqucv/siamban\n\n## 多目标跟踪\n\n**3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**\n\n- 主页：https://vap.aau.dk/3d-zef/\n- 论文：https://arxiv.org/abs/2006.08466\n- 代码：https://bitbucket.org/aauvap/3d-zef/src/master/\n- 数据集：https://motchallenge.net/data/3D-ZeF20\n\n<a name=\"Semantic-Segmentation\"></a>\n\n# 语义分割\n\n**FDA: Fourier Domain Adaptation for Semantic Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.05498\n\n- 代码：https://github.com/YanchaoYang/FDA\n\n**Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation**\n\n- 论文：暂无\n\n- 代码：https://github.com/JianqiangWan/Super-BPD\n\n**Single-Stage Semantic Segmentation from Image Labels**\n\n- 论文：https://arxiv.org/abs/2005.08104\n\n- 代码：https://github.com/visinf/1-stage-wseg\n\n**Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation**\n\n- 论文：https://arxiv.org/abs/2003.00867\n- 代码：https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation\n\n**MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**\n\n- 论文：http://vladlen.info/papers/MSeg.pdf\n- 代码：https://github.com/mseg-dataset/mseg-api\n\n**CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement**\n\n- 论文：https://arxiv.org/abs/2005.02551\n- 代码：https://github.com/hkchengrex/CascadePSP\n\n**Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision**\n\n- Oral\n- 论文：https://arxiv.org/abs/2004.07703\n- 代码：https://github.com/feipan664/IntraDA\n\n**Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.04581\n- 代码：https://github.com/YudeWang/SEAM\n\n**Temporally Distributed Networks for Fast Video Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.01800\n\n- 代码：https://github.com/feinanshan/TDNet\n\n**Context Prior for Scene Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.01547\n\n- 代码：https://git.io/ContextPrior\n\n**Strip Pooling: Rethinking Spatial Pooling for Scene Parsing**\n\n- 论文：https://arxiv.org/abs/2003.13328\n\n- 代码：https://github.com/Andrew-Qibin/SPNet\n\n**Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks**\n\n- 论文：https://arxiv.org/abs/2003.05128\n- 代码：https://github.com/shachoi/HANet\n\n**Learning Dynamic Routing for Semantic Segmentation**\n\n- 论文：https://arxiv.org/abs/2003.10401\n\n- 代码：https://github.com/yanwei-li/DynamicRouting\n\n<a name=\"Instance-Segmentation\"></a>\n\n# 实例分割\n\n**D2Det: Towards High Quality Object Detection and Instance Segmentation**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf\n\n- 代码：https://github.com/JialeCao001/D2Det\n\n**PolarMask: Single Shot Instance Segmentation with Polar Representation**\n\n- 论文：https://arxiv.org/abs/1909.13226 \n- 代码：https://github.com/xieenze/PolarMask \n- 解读：https://zhuanlan.zhihu.com/p/84890413 \n\n**CenterMask : Real-Time Anchor-Free Instance Segmentation**\n\n- 论文：https://arxiv.org/abs/1911.06667 \n- 代码：https://github.com/youngwanLEE/CenterMask \n\n**BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation**\n\n- 论文：https://arxiv.org/abs/2001.00309\n- 代码：https://github.com/aim-uofa/AdelaiDet\n\n**Deep Snake for Real-Time Instance Segmentation**\n\n- 论文：https://arxiv.org/abs/2001.01629\n- 代码：https://github.com/zju3dv/snake\n\n**Mask Encoding for Single Shot Instance Segmentation**\n\n- 论文：https://arxiv.org/abs/2003.11712\n\n- 代码：https://github.com/aim-uofa/AdelaiDet\n\n<a name=\"Panoptic-Segmentation\"></a>\n\n# 全景分割\n\n**Video Panoptic Segmentation**\n\n- 论文：https://arxiv.org/abs/2006.11339\n- 代码：https://github.com/mcahny/vps\n- 数据集：https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0\n\n**Pixel Consensus Voting for Panoptic Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.01849\n- 代码：还未公布\n\n**BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation**\n\n论文：https://arxiv.org/abs/2003.14031\n\n代码：https://github.com/Mooonside/BANet\n\n<a name=\"VOS\"></a>\n\n# 视频目标分割\n\n**A Transductive Approach for Video Object Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.07193\n\n- 代码：https://github.com/microsoft/transductive-vos.pytorch\n\n**State-Aware Tracker for Real-Time Video Object Segmentation**\n\n- 论文：https://arxiv.org/abs/2003.00482\n\n- 代码：https://github.com/MegviiDetection/video_analyst\n\n**Learning Fast and Robust Target Models for Video Object Segmentation**\n\n- 论文：https://arxiv.org/abs/2003.00908 \n- 代码：https://github.com/andr345/frtm-vos\n\n**Learning Video Object Segmentation from Unlabeled Videos**\n\n- 论文：https://arxiv.org/abs/2003.05020\n- 代码：https://github.com/carrierlxk/MuG\n\n<a name=\"Superpixel\"></a>\n\n# 超像素分割\n\n**Superpixel Segmentation with Fully Convolutional Networks**\n\n- 论文：https://arxiv.org/abs/2003.12929\n- 代码：https://github.com/fuy34/superpixel_fcn\n\n<a name=\"IIS\"></a>\n\n# 交互式图像分割\n\n**Interactive Object Segmentation with Inside-Outside Guidance**\n\n- 论文下载链接：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf\n- 代码：https://github.com/shiyinzhang/Inside-Outside-Guidance\n- 数据集：https://github.com/shiyinzhang/Pixel-ImageNet\n\n<a name=\"NAS\"></a>\n\n# NAS\n\n**AOWS: Adaptive and optimal network width search with latency constraints**\n\n- 论文：https://arxiv.org/abs/2005.10481\n- 代码：https://github.com/bermanmaxim/AOWS\n\n**Densely Connected Search Space for More Flexible Neural Architecture Search**\n\n- 论文：https://arxiv.org/abs/1906.09607\n\n- 代码：https://github.com/JaminFong/DenseNAS\n\n**MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning**\n\n- 论文：https://arxiv.org/abs/2003.14058\n\n- 代码：https://github.com/bhpfelix/MTLNAS\n\n**FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions**\n\n- 论文下载链接：https://arxiv.org/abs/2004.05565\n\n- 代码：https://github.com/facebookresearch/mobile-vision\n\n**Neural Architecture Search for Lightweight Non-Local Networks**\n\n- 论文：https://arxiv.org/abs/2004.01961\n- 代码：https://github.com/LiYingwei/AutoNL\n\n**Rethinking Performance Estimation in Neural Architecture Search**\n\n- 论文：https://arxiv.org/abs/2005.09917\n- 代码：https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS\n- 解读1：https://www.zhihu.com/question/372070853/answer/1035234510\n- 解读2：https://zhuanlan.zhihu.com/p/111167409\n\n**CARS: Continuous Evolution for Efficient Neural Architecture Search**\n\n- 论文：https://arxiv.org/abs/1909.04977 \n- 代码（即将开源）：https://github.com/huawei-noah/CARS \n\n<a name=\"GAN\"></a>\n\n# GAN\n\n**SEAN: Image Synthesis with Semantic Region-Adaptive Normalization**\n\n- 论文：https://arxiv.org/abs/1911.12861\n- 代码：https://github.com/ZPdesu/SEAN\n\n**Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation**\n\n- 论文地址：http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html\n- 代码地址：https://github.com/alpc91/NICE-GAN-pytorch \n\n**Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning**\n\n- 论文：https://arxiv.org/abs/1912.01899\n- 代码：https://github.com/SsGood/DBGAN \n\n**PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer**\n\n- 论文：https://arxiv.org/abs/1909.06956\n- 代码：https://github.com/wtjiang98/PSGAN\n\n**Semantically Mutil-modal Image Synthesis**\n\n- 主页：http://seanseattle.github.io/SMIS\n- 论文：https://arxiv.org/abs/2003.12697\n- 代码：https://github.com/Seanseattle/SMIS\n\n**Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping**\n\n- 论文：https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf\n- 代码：https://github.com/yiranran/Unpaired-Portrait-Drawing\n\n**Learning to Cartoonize Using White-box Cartoon Representations**\n\n- 论文：https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf\n\n- 主页：https://systemerrorwang.github.io/White-box-Cartoonization/\n- 代码：https://github.com/SystemErrorWang/White-box-Cartoonization\n- 解读：https://zhuanlan.zhihu.com/p/117422157\n- Demo视频：https://www.bilibili.com/video/av56708333\n\n**GAN Compression: Efficient Architectures for Interactive Conditional GANs**\n\n- 论文：https://arxiv.org/abs/2003.08936\n\n- 代码：https://github.com/mit-han-lab/gan-compression\n\n**Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions**\n\n- 论文：https://arxiv.org/abs/2003.01826 \n- 代码：https://github.com/cc-hpc-itwm/UpConv \n\n<a name=\"Re-ID\"></a>\n\n# Re-ID\n\n **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html\n- 代码：https://github.com/wangguanan/HOReID \n\n**COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**\n\n- 论文：https://arxiv.org/abs/2005.07862\n\n- 数据集：暂无\n\n**Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking**\n\n- 论文：https://arxiv.org/abs/2004.04199\n\n- 代码：https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking\n\n**Pose-guided Visible Part Matching for Occluded Person ReID**\n\n- 论文：https://arxiv.org/abs/2004.00230\n- 代码：https://github.com/hh23333/PVPM\n\n**Weakly supervised discriminative feature learning with state information for person identification**\n\n- 论文：https://arxiv.org/abs/2002.11939 \n- 代码：https://github.com/KovenYu/state-information \n\n<a name=\"3D-PointCloud\"></a>\n\n# 3D点云（分类/分割/配准等）\n\n## 3D点云卷积\n\n**PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling**\n\n- 论文：https://arxiv.org/abs/2003.00492\n- 代码：https://github.com/yanx27/PointASNL \n\n**Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds**\n\n- 论文下载链接：https://arxiv.org/abs/2003.12971\n\n- 代码：https://github.com/raoyongming/PointGLR\n\n**Grid-GCN for Fast and Scalable Point Cloud Learning**\n\n- 论文：https://arxiv.org/abs/1912.02984\n\n- 代码：https://github.com/Xharlie/Grid-GCN\n\n**FPConv: Learning Local Flattening for Point Convolution**\n\n- 论文：https://arxiv.org/abs/2002.10701\n- 代码：https://github.com/lyqun/FPConv\n\n## 3D点云分类\n\n**PointAugment: an Auto-Augmentation Framework for Point Cloud Classification**\n\n- 论文：https://arxiv.org/abs/2002.10876 \n- 代码（即将开源）： https://github.com/liruihui/PointAugment/ \n\n## 3D点云语义分割\n\n**RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds**\n\n- 论文：https://arxiv.org/abs/1911.11236\n- 代码：https://github.com/QingyongHu/RandLA-Net\n\n- 解读：https://zhuanlan.zhihu.com/p/105433460\n\n**Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels**\n\n- 论文：https://arxiv.org/abs/2004.04091\n\n- 代码：https://github.com/alex-xun-xu/WeakSupPointCloudSeg\n\n**PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation**\n\n- 论文：https://arxiv.org/abs/2003.14032\n- 代码：https://github.com/edwardzhou130/PolarSeg\n\n**Learning to Segment 3D Point Clouds in 2D Image Space**\n\n- 论文：https://arxiv.org/abs/2003.05593\n\n- 代码：https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space\n\n## 3D点云实例分割\n\nPointGroup: Dual-Set Point Grouping for 3D Instance Segmentation\n\n- 论文：https://arxiv.org/abs/2004.01658\n- 代码：https://github.com/Jia-Research-Lab/PointGroup\n\n## 3D点云配准\n\n**Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences**\n\n- 论文：https://arxiv.org/abs/2005.01014\n- 代码：https://github.com/XiaoshuiHuang/fmr \n\n**D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features**\n\n- 论文：https://arxiv.org/abs/2003.03164\n- 代码：https://github.com/XuyangBai/D3Feat\n\n**RPM-Net: Robust Point Matching using Learned Features**\n\n- 论文：https://arxiv.org/abs/2003.13479\n- 代码：https://github.com/yewzijian/RPMNet \n\n## 3D点云补全\n\n**Cascaded Refinement Network for Point Cloud Completion**\n\n- 论文：https://arxiv.org/abs/2004.03327\n- 代码：https://github.com/xiaogangw/cascaded-point-completion\n\n## 3D点云目标跟踪\n\n**P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds**\n\n- 论文：https://arxiv.org/abs/2005.13888\n- 代码：https://github.com/HaozheQi/P2B\n\n## 其他\n\n**An Efficient PointLSTM for Point Clouds Based Gesture Recognition**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html\n- 代码：https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch\n\n<a name=\"Face\"></a>\n\n# 人脸\n\n## 人脸识别\n\n**CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition**\n\n- 论文：https://arxiv.org/abs/2004.00288\n\n- 代码：https://github.com/HuangYG123/CurricularFace\n\n**Learning Meta Face Recognition in Unseen Domains**\n\n- 论文：https://arxiv.org/abs/2003.07733\n- 代码：https://github.com/cleardusk/MFR\n- 解读：https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ \n\n## 人脸检测\n\n## 人脸活体检测\n\n**Searching Central Difference Convolutional Networks for Face Anti-Spoofing**\n\n- 论文：https://arxiv.org/abs/2003.04092\n\n- 代码：https://github.com/ZitongYu/CDCN\n\n## 人脸表情识别\n\n**Suppressing Uncertainties for Large-Scale Facial Expression Recognition**\n\n- 论文：https://arxiv.org/abs/2002.10392 \n\n- 代码（即将开源）：https://github.com/kaiwang960112/Self-Cure-Network \n\n## 人脸转正\n\n**Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images**\n\n- 论文：https://arxiv.org/abs/2003.08124\n- 代码：https://github.com/Hangz-nju-cuhk/Rotate-and-Render\n\n## 人脸3D重建\n\n**AvatarMe: Realistically Renderable 3D Facial Reconstruction \"in-the-wild\"**\n\n- 论文：https://arxiv.org/abs/2003.13845\n- 数据集：https://github.com/lattas/AvatarMe\n\n**FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**\n\n- 论文：https://arxiv.org/abs/2003.13989\n- 代码：https://github.com/zhuhao-nju/facescape\n\n<a name=\"Human-Pose-Estimation\"></a>\n\n# 人体姿态估计(2D/3D)\n\n## 2D人体姿态估计\n\n**TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting**\n\n- 主页：https://yzhq97.github.io/transmomo/\n\n- 论文：https://arxiv.org/abs/2003.14401\n- 代码：https://github.com/yzhq97/transmomo.pytorch\n\n**HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation**\n\n- 论文：https://arxiv.org/abs/1908.10357\n- 代码：https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation\n\n**The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation**\n\n- 论文：https://arxiv.org/abs/1911.07524 \n- 代码：https://github.com/HuangJunJie2017/UDP-Pose\n- 解读：https://zhuanlan.zhihu.com/p/92525039\n\n**Distribution-Aware Coordinate Representation for Human Pose Estimation**\n\n- 主页：https://ilovepose.github.io/coco/ \n\n- 论文：https://arxiv.org/abs/1910.06278 \n\n- 代码：https://github.com/ilovepose/DarkPose \n\n## 3D人体姿态估计\n\n **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data**\n\n- 论文：https://arxiv.org/abs/2006.07778\n- 代码：https://github.com/Nicholasli1995/EvoSkeleton \n\n**Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach**\n\n- 主页：https://www.zhe-zhang.com/cvpr2020\n- 论文：https://arxiv.org/abs/2003.11163\n\n- 代码：https://github.com/CHUNYUWANG/imu-human-pose-pytorch\n\n**Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**\n\n- 论文下载链接：https://arxiv.org/abs/2004.01166\n\n- 代码：https://github.com/Healthcare-Robotics/bodies-at-rest\n- 数据集：https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML\n\n**Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis**\n\n- 主页：http://val.cds.iisc.ac.in/pgp-human/\n- 论文：https://arxiv.org/abs/2004.04400\n\n**Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation**\n\n- 论文：https://arxiv.org/abs/2004.00329\n- 代码：https://github.com/fabbrimatteo/LoCO\n\n**VIBE: Video Inference for Human Body Pose and Shape Estimation**\n\n- 论文：https://arxiv.org/abs/1912.05656 \n- 代码：https://github.com/mkocabas/VIBE\n\n**Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation**\n\n- 论文：https://arxiv.org/abs/2002.11251 \n- 代码：https://github.com/vnmr/JointVideoPose3D\n\n**Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**\n\n- 论文：https://arxiv.org/abs/2003.03972\n- 数据集：暂无\n\n<a name=\"Human-Parsing\"></a>\n\n# 人体解析\n\n**Correlating Edge, Pose with Parsing**\n\n- 论文：https://arxiv.org/abs/2005.01431\n\n- 代码：https://github.com/ziwei-zh/CorrPM\n\n<a name=\"Scene-Text-Detection\"></a>\n\n# 场景文本检测\n\n**STEFANN: Scene Text Editor using Font Adaptive Neural Network**\n\n- 主页：https://prasunroy.github.io/stefann/\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html\n- 代码：https://github.com/prasunroy/stefann\n- 数据集：https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k\n\n**ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf\n- 代码：https://github.com/wangyuxin87/ContourNet \n\n**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**\n\n- 论文：https://arxiv.org/abs/2003.10608\n- 代码和数据集：https://github.com/Jyouhou/UnrealText/\n\n**ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**\n\n- 论文：https://arxiv.org/abs/2002.10200 \n- 代码（即将开源）：https://github.com/Yuliang-Liu/bezier_curve_text_spotting\n- 代码（即将开源）：https://github.com/aim-uofa/adet\n\n**Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection**\n\n- 论文：https://arxiv.org/abs/2003.07493\n\n- 代码：https://github.com/GXYM/DRRG\n\n<a name=\"Scene-Text-Recognition\"></a>\n\n# 场景文本识别\n\n**SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition**\n\n- 论文：https://arxiv.org/abs/2005.10977\n- 代码：https://github.com/Pay20Y/SEED\n\n**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**\n\n- 论文：https://arxiv.org/abs/2003.10608\n- 代码和数据集：https://github.com/Jyouhou/UnrealText/\n\n**ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**\n\n- 论文：https://arxiv.org/abs/2002.10200 \n- 代码（即将开源）：https://github.com/aim-uofa/adet\n\n**Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition**\n\n- 论文：https://arxiv.org/abs/2003.06606\n\n- 代码：https://github.com/Canjie-Luo/Text-Image-Augmentation\n\n<a name=\"Feature\"></a>\n\n# 特征(点)检测和描述\n\n**SuperGlue: Learning Feature Matching with Graph Neural Networks**\n\n- 论文：https://arxiv.org/abs/1911.11763\n- 代码：https://github.com/magicleap/SuperGluePretrainedNetwork\n\n<a name=\"Super-Resolution\"></a>\n\n# 超分辨率\n\n## 图像超分辨率\n\n**Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html\n- 代码：https://github.com/guoyongcs/DRN\n\n**Learning Texture Transformer Network for Image Super-Resolution**\n\n- 论文：https://arxiv.org/abs/2006.04139\n\n- 代码：https://github.com/FuzhiYang/TTSR\n\n**Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining**\n\n- 论文：https://arxiv.org/abs/2006.01424\n- 代码：https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention\n\n**Structure-Preserving Super Resolution with Gradient Guidance**\n\n- 论文：https://arxiv.org/abs/2003.13081\n\n- 代码：https://github.com/Maclory/SPSR\n\n**Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy**\n\n论文：https://arxiv.org/abs/2004.00448\n\n代码：https://github.com/clovaai/cutblur\n\n## 视频超分辨率\n\n**TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution**\n\n- 论文：https://arxiv.org/abs/1812.02898\n- 代码：https://github.com/YapengTian/TDAN-VSR-CVPR-2020\n\n**Space-Time-Aware Multi-Resolution Video Enhancement**\n\n- 主页：https://alterzero.github.io/projects/STAR.html\n- 论文：http://arxiv.org/abs/2003.13170\n- 代码：https://github.com/alterzero/STARnet\n\n**Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**\n\n- 论文：https://arxiv.org/abs/2002.11616 \n- 代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 \n\n<a name=\"Model-Compression\"></a>\n\n# 模型压缩/剪枝\n\n**DMCP: Differentiable Markov Channel Pruning for Neural Networks**\n\n- 论文：https://arxiv.org/abs/2005.03354\n- 代码：https://github.com/zx55/dmcp\n\n**Forward and Backward Information Retention for Accurate Binary Neural Networks**\n\n- 论文：https://arxiv.org/abs/1909.10788\n\n- 代码：https://github.com/htqin/IR-Net\n\n**Towards Efficient Model Compression via Learned Global Ranking**\n\n- 论文：https://arxiv.org/abs/1904.12368\n- 代码：https://github.com/cmu-enyac/LeGR\n\n**HRank: Filter Pruning using High-Rank Feature Map**\n\n- 论文：http://arxiv.org/abs/2002.10179\n- 代码：https://github.com/lmbxmu/HRank \n\n**GAN Compression: Efficient Architectures for Interactive Conditional GANs**\n\n- 论文：https://arxiv.org/abs/2003.08936\n\n- 代码：https://github.com/mit-han-lab/gan-compression\n\n**Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression**\n\n- 论文：https://arxiv.org/abs/2003.08935\n\n- 代码：https://github.com/ofsoundof/group_sparsity\n\n<a name=\"Action-Recognition\"></a>\n\n# 视频理解/行为识别\n\n**Oops! Predicting Unintentional Action in Video**\n\n- 主页：https://oops.cs.columbia.edu/\n\n- 论文：https://arxiv.org/abs/1911.11206\n- 代码：https://github.com/cvlab-columbia/oops\n- 数据集：https://oops.cs.columbia.edu/data\n\n**PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition**\n\n- 论文：https://arxiv.org/abs/1911.12409\n- 代码：https://github.com/shlizee/Predict-Cluster \n\n**Intra- and Inter-Action Understanding via Temporal Action Parsing**\n\n- 论文：https://arxiv.org/abs/2005.10229\n- 主页和数据集：https://sdolivia.github.io/TAPOS/\n\n**3DV: 3D Dynamic Voxel for Action Recognition in Depth Video**\n\n- 论文：https://arxiv.org/abs/2005.05501\n- 代码：https://github.com/3huo/3DV-Action\n\n**FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**\n\n- 主页：https://sdolivia.github.io/FineGym/\n- 论文：https://arxiv.org/abs/2004.06704\n\n**TEA: Temporal Excitation and Aggregation for Action Recognition**\n\n- 论文：https://arxiv.org/abs/2004.01398\n\n- 代码：https://github.com/Phoenix1327/tea-action-recognition\n\n**X3D: Expanding Architectures for Efficient Video Recognition**\n\n- 论文：https://arxiv.org/abs/2004.04730\n\n- 代码：https://github.com/facebookresearch/SlowFast\n\n**Temporal Pyramid Network for Action Recognition**\n\n- 主页：https://decisionforce.github.io/TPN\n\n- 论文：https://arxiv.org/abs/2004.03548 \n- 代码：https://github.com/decisionforce/TPN \n\n## 基于骨架的动作识别\n\n**Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition**\n\n- 论文：https://arxiv.org/abs/2003.14111\n- 代码：https://github.com/kenziyuliu/ms-g3d\n\n<a name=\"Crowd-Counting\"></a>\n\n# 人群计数\n\n<a name=\"Depth-Estimation\"></a>\n\n# 深度估计\n\n**BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf\n- 代码：https://github.com/Yeh-yu-hsuan/BiFuse\n\n**Focus on defocus: bridging the synthetic to real domain gap for depth estimation**\n\n- 论文：https://arxiv.org/abs/2005.09623\n- 代码：https://github.com/dvl-tum/defocus-net\n\n**Bi3D: Stereo Depth Estimation via Binary Classifications**\n\n- 论文：https://arxiv.org/abs/2005.07274\n\n- 代码：https://github.com/NVlabs/Bi3D\n\n**AANet: Adaptive Aggregation Network for Efficient Stereo Matching**\n\n- 论文：https://arxiv.org/abs/2004.09548\n- 代码：https://github.com/haofeixu/aanet\n\n**Towards Better Generalization: Joint Depth-Pose Learning without PoseNet**\n\n- 论文：https://github.com/B1ueber2y/TrianFlow\n\n- 代码：https://github.com/B1ueber2y/TrianFlow\n\n## 单目深度估计\n\n**On the uncertainty of self-supervised monocular depth estimation**\n\n- 论文：https://arxiv.org/abs/2005.06209\n- 代码：https://github.com/mattpoggi/mono-uncertainty\n\n**3D Packing for Self-Supervised Monocular Depth Estimation**\n\n- 论文：https://arxiv.org/abs/1905.02693\n- 代码：https://github.com/TRI-ML/packnet-sfm\n- Demo视频：https://www.bilibili.com/video/av70562892/\n\n**Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation**\n\n- 论文：https://arxiv.org/abs/2002.12114\n- 代码：https://github.com/yzhao520/ARC\n\n<a name=\"6DOF\"></a>\n\n# 6D目标姿态估计\n\n **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf\n- 代码：https://github.com/ethnhe/PVN3D\n\n**MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion**\n\n- 论文：https://arxiv.org/abs/2004.04336\n- 代码：https://github.com/wkentaro/morefusion\n\n**EPOS: Estimating 6D Pose of Objects with Symmetries**\n\n主页：http://cmp.felk.cvut.cz/epos\n\n论文：https://arxiv.org/abs/2004.00605\n\n**G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features**\n\n- 论文：https://arxiv.org/abs/2003.11089\n\n- 代码：https://github.com/DC1991/G2L_Net\n\n<a name=\"Hand-Pose\"></a>\n\n# 手势估计\n\n**HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation**\n\n- 论文：https://arxiv.org/abs/2004.00060\n\n- 主页：http://vision.sice.indiana.edu/projects/hopenet\n\n**Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data**\n\n- 论文：https://arxiv.org/abs/2003.09572\n\n- 代码：https://github.com/CalciferZh/minimal-hand\n\n<a name=\"Saliency\"></a>\n\n# 显著性检测\n\n**JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection**\n\n- 论文：https://arxiv.org/abs/2004.08515\n\n- 代码：https://github.com/kerenfu/JLDCF/\n\n**UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders**\n\n- 主页：http://dpfan.net/d3netbenchmark/\n\n- 论文：https://arxiv.org/abs/2004.05763\n- 代码：https://github.com/JingZhang617/UCNet\n\n<a name=\"Denoising\"></a>\n\n# 去噪\n\n**A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising**\n\n- 论文：https://arxiv.org/abs/2003.12751\n\n- 代码：https://github.com/Vandermode/NoiseModel\n\n**CycleISP: Real Image Restoration via Improved Data Synthesis**\n\n- 论文：https://arxiv.org/abs/2003.07761\n\n- 代码：https://github.com/swz30/CycleISP\n\n<a name=\"Deraining\"></a>\n\n# 去雨\n\n**Multi-Scale Progressive Fusion Network for Single Image Deraining**\n\n- 论文：https://arxiv.org/abs/2003.10985\n- 代码：https://github.com/kuihua/MSPFN\n\n**Detail-recovery Image Deraining via Context Aggregation Networks**\n\n- 论文：https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html\n- 代码：https://github.com/Dengsgithub/DRD-Net\n\n<a name=\"Deblurring\"></a>\n\n# 去模糊\n\n## 视频去模糊\n\n**Cascaded Deep Video Deblurring Using Temporal Sharpness Prior**\n\n- 主页：https://csbhr.github.io/projects/cdvd-tsp/index.html \n- 论文：https://arxiv.org/abs/2004.02501 \n- 代码：https://github.com/csbhr/CDVD-TSP\n\n<a name=\"Dehazing\"></a>\n\n# 去雾\n\n**Domain Adaptation for Image Dehazing**\n\n- 论文：https://arxiv.org/abs/2005.04668\n\n- 代码：https://github.com/HUSTSYJ/DA_dahazing\n\n**Multi-Scale Boosted Dehazing Network with Dense Feature Fusion**\n\n- 论文：https://arxiv.org/abs/2004.13388\n\n- 代码：https://github.com/BookerDeWitt/MSBDN-DFF\n\n<a name=\"Feature\"></a>\n\n# 特征点检测与描述\n\n**ASLFeat: Learning Local Features of Accurate Shape and Localization**\n\n- 论文：https://arxiv.org/abs/2003.10071\n\n- 代码：https://github.com/lzx551402/aslfeat\n\n<a name=\"VQA\"></a>\n\n# 视觉问答(VQA)\n\n**VC R-CNN：Visual Commonsense R-CNN** \n\n- 论文：https://arxiv.org/abs/2002.12204\n- 代码：https://github.com/Wangt-CN/VC-R-CNN\n\n<a name=\"VideoQA\"></a>\n\n# 视频问答(VideoQA)\n\n**Hierarchical Conditional Relation Networks for Video Question Answering**\n\n- 论文：https://arxiv.org/abs/2002.10698\n- 代码：https://github.com/thaolmk54/hcrn-videoqa\n\n<a name=\"VLN\"></a>\n\n# 视觉语言导航\n\n**Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training**\n\n- 论文：https://arxiv.org/abs/2002.10638\n- 代码（即将开源）：https://github.com/weituo12321/PREVALENT\n\n<a name=\"Video-Compression\"></a>\n\n# 视频压缩\n\n**Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement**\n\n- 论文：https://arxiv.org/abs/2003.01966 \n- 代码：https://github.com/RenYang-home/HLVC\n\n<a name=\"Video-Frame-Interpolation\"></a>\n\n# 视频插帧\n\n**AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation**\n\n- 论文：https://arxiv.org/abs/1907.10244\n- 代码：https://github.com/HyeongminLEE/AdaCoF-pytorch\n\n**FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html\n\n- 代码：https://github.com/CM-BF/FeatureFlow\n\n**Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**\n\n- 论文：https://arxiv.org/abs/2002.11616\n- 代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020\n\n**Space-Time-Aware Multi-Resolution Video Enhancement**\n\n- 主页：https://alterzero.github.io/projects/STAR.html\n- 论文：http://arxiv.org/abs/2003.13170\n- 代码：https://github.com/alterzero/STARnet\n\n**Scene-Adaptive Video Frame Interpolation via Meta-Learning**\n\n- 论文：https://arxiv.org/abs/2004.00779\n- 代码：https://github.com/myungsub/meta-interpolation\n\n**Softmax Splatting for Video Frame Interpolation**\n\n- 主页：http://sniklaus.com/papers/softsplat\n- 论文：https://arxiv.org/abs/2003.05534\n- 代码：https://github.com/sniklaus/softmax-splatting\n\n<a name=\"Style-Transfer\"></a>\n\n# 风格迁移\n\n**Diversified Arbitrary Style Transfer via Deep Feature Perturbation**\n\n- 论文：https://arxiv.org/abs/1909.08223\n- 代码：https://github.com/EndyWon/Deep-Feature-Perturbation\n\n**Collaborative Distillation for Ultra-Resolution Universal Style Transfer**\n\n- 论文：https://arxiv.org/abs/2003.08436\n\n- 代码：https://github.com/mingsun-tse/collaborative-distillation\n\n<a name=\"Lane-Detection\"></a>\n\n# 车道线检测\n\n**Inter-Region Affinity Distillation for Road Marking Segmentation**\n\n- 论文：https://arxiv.org/abs/2004.05304\n- 代码：https://github.com/cardwing/Codes-for-IntRA-KD\n\n<a name=\"HOI\"></a>\n\n# \"人-物\"交互(HOT)检测\n\n**PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection**\n\n- 论文：https://arxiv.org/abs/1912.12898\n- 代码：https://github.com/YueLiao/PPDM\n\n**Detailed 2D-3D Joint Representation for Human-Object Interaction**\n\n- 论文：https://arxiv.org/abs/2004.08154\n\n- 代码：https://github.com/DirtyHarryLYL/DJ-RN\n\n**Cascaded Human-Object Interaction Recognition**\n\n- 论文：https://arxiv.org/abs/2003.04262\n\n- 代码：https://github.com/tfzhou/C-HOI\n\n**VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions**\n\n- 论文：https://arxiv.org/abs/2003.05541\n- 代码：https://github.com/ASMIftekhar/VSGNet\n\n<a name=\"TP\"></a>\n\n# 轨迹预测\n\n**The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**\n\n- 论文：https://arxiv.org/abs/1912.06445\n- 代码：https://github.com/JunweiLiang/Multiverse\n- 数据集：https://next.cs.cmu.edu/multiverse/\n\n**Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction**\n\n- 论文：https://arxiv.org/abs/2002.11927 \n- 代码：https://github.com/abduallahmohamed/Social-STGCNN \n\n<a name=\"Motion-Predication\"></a>\n\n# 运动预测\n\n**Collaborative Motion Prediction via Neural Motion Message Passing**\n\n- 论文：https://arxiv.org/abs/2003.06594\n- 代码：https://github.com/PhyllisH/NMMP\n\n**MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps**\n\n- 论文：https://arxiv.org/abs/2003.06754\n\n- 代码：https://github.com/pxiangwu/MotionNet\n\n<a name=\"OF\"></a>\n\n# 光流估计\n\n**Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation**\n\n- 论文：https://arxiv.org/abs/2003.13045\n- 代码：https://github.com/lliuz/ARFlow \n\n<a name=\"IR\"></a>\n\n# 图像检索\n\n**Evade Deep Image Retrieval by Stashing Private Images in the Hash Space**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html\n- 代码：https://github.com/sugarruy/hashstash\n\n<a name=\"Virtual-Try-On\"></a>\n\n# 虚拟试衣\n\n**Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content**\n\n- 论文：https://arxiv.org/abs/2003.05863\n- 代码：https://github.com/switchablenorms/DeepFashion_Try_On\n\n<a name=\"HDR\"></a>\n\n# HDR\n\n**Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline**\n\n- 主页：https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR\n\n- 论文下载链接：https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf\n\n- 代码：https://github.com/alex04072000/SingleHDR\n\n<a name=\"AE\"></a>\n\n# 对抗样本\n\n**Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction**\n\n- 论文：https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf\n- 代码：https://github.com/erbloo/dr_cvpr20 \n\n**Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance**\n\n- 论文：https://arxiv.org/abs/1911.02466\n- 代码：https://github.com/ZhengyuZhao/PerC-Adversarial \n\n<a name=\"3D-Reconstructing\"></a>\n\n# 三维重建\n\n**Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild**\n\n- **CVPR 2020 Best Paper**\n- 主页：https://elliottwu.com/projects/unsup3d/\n- 论文：https://arxiv.org/abs/1911.11130\n- 代码：https://github.com/elliottwu/unsup3d\n\n**Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization**\n\n- 主页：https://shunsukesaito.github.io/PIFuHD/\n- 论文：https://arxiv.org/abs/2004.00452\n- 代码：https://github.com/facebookresearch/pifuhd\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf\n- 代码：https://github.com/chaitanya100100/TailorNet\n- 数据集：https://github.com/zycliao/TailorNet_dataset\n\n**Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf\n- 代码：https://github.com/jchibane/if-net\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf\n- 代码：https://github.com/aymenmir1/pix2surf\n\n<a name=\"DC\"></a>\n\n# 深度补全\n\n**Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End**\n\n论文：https://arxiv.org/abs/2006.03349\n\n代码：https://github.com/abdo-eldesokey/pncnn\n\n<a name=\"SSC\"></a>\n\n# 语义场景补全\n\n**3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior**\n\n- 论文：https://arxiv.org/abs/2003.14052\n- 代码：https://github.com/charlesCXK/TorchSSC\n\n<a name=\"Captioning\"></a>\n\n# 图像/视频描述\n\n**Syntax-Aware Action Targeting for Video Captioning**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf\n- 代码：https://github.com/SydCaption/SAAT \n\n<a name=\"WP\"></a>\n\n# 线框解析\n\n**Holistically-Attracted Wireframe Parser**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html\n\n- 代码：https://github.com/cherubicXN/hawp\n\n<a name=\"Datasets\"></a>\n\n# 数据集\n\n**OASIS: A Large-Scale Dataset for Single Image 3D in the Wild**\n\n- 论文：https://arxiv.org/abs/2007.13215\n- 数据集：https://oasis.cs.princeton.edu/\n\n**STEFANN: Scene Text Editor using Font Adaptive Neural Network**\n\n- 主页：https://prasunroy.github.io/stefann/\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html\n- 代码：https://github.com/prasunroy/stefann\n- 数据集：https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k\n\n**Interactive Object Segmentation with Inside-Outside Guidance**\n\n- 论文下载链接：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf\n- 代码：https://github.com/shiyinzhang/Inside-Outside-Guidance\n- 数据集：https://github.com/shiyinzhang/Pixel-ImageNet\n\n**Video Panoptic Segmentation**\n\n- 论文：https://arxiv.org/abs/2006.11339\n- 代码：https://github.com/mcahny/vps\n- 数据集：https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0\n\n**FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html\n\n- 代码：https://github.com/HKUSTCV/FSS-1000\n\n- 数据集：https://github.com/HKUSTCV/FSS-1000\n\n**3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**\n\n- 主页：https://vap.aau.dk/3d-zef/\n- 论文：https://arxiv.org/abs/2006.08466\n- 代码：https://bitbucket.org/aauvap/3d-zef/src/master/\n- 数据集：https://motchallenge.net/data/3D-ZeF20\n\n**TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf\n- 代码：https://github.com/chaitanya100100/TailorNet\n- 数据集：https://github.com/zycliao/TailorNet_dataset\n\n**Oops! Predicting Unintentional Action in Video**\n\n- 主页：https://oops.cs.columbia.edu/\n\n- 论文：https://arxiv.org/abs/1911.11206\n- 代码：https://github.com/cvlab-columbia/oops\n- 数据集：https://oops.cs.columbia.edu/data\n\n**The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**\n\n- 论文：https://arxiv.org/abs/1912.06445\n- 代码：https://github.com/JunweiLiang/Multiverse\n- 数据集：https://next.cs.cmu.edu/multiverse/\n\n**Open Compound Domain Adaptation**\n\n- 主页：https://liuziwei7.github.io/projects/CompoundDomain.html\n- 数据集：https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing\n- 论文：https://arxiv.org/abs/1909.03403\n- 代码：https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA\n\n**Intra- and Inter-Action Understanding via Temporal Action Parsing**\n\n- 论文：https://arxiv.org/abs/2005.10229\n- 主页和数据集：https://sdolivia.github.io/TAPOS/\n\n**Dynamic Refinement Network for Oriented and Densely Packed Object Detection**\n\n- 论文下载链接：https://arxiv.org/abs/2005.09973\n\n- 代码和数据集：https://github.com/Anymake/DRN_CVPR2020\n\n**COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**\n\n- 论文：https://arxiv.org/abs/2005.07862\n\n- 数据集：暂无\n\n**KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations**\n\n- 论文：https://arxiv.org/abs/2002.12687\n\n- 数据集：https://github.com/qq456cvb/KeypointNet\n\n**MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**\n\n- 论文：http://vladlen.info/papers/MSeg.pdf\n- 代码：https://github.com/mseg-dataset/mseg-api\n- 数据集：https://github.com/mseg-dataset/mseg-semantic\n\n**AvatarMe: Realistically Renderable 3D Facial Reconstruction \"in-the-wild\"**\n\n- 论文：https://arxiv.org/abs/2003.13845\n- 数据集：https://github.com/lattas/AvatarMe\n\n**Learning to Autofocus**\n\n- 论文：https://arxiv.org/abs/2004.12260\n- 数据集：暂无\n\n**FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**\n\n- 论文：https://arxiv.org/abs/2003.13989\n- 代码：https://github.com/zhuhao-nju/facescape\n\n**Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**\n\n- 论文下载链接：https://arxiv.org/abs/2004.01166\n\n- 代码：https://github.com/Healthcare-Robotics/bodies-at-rest\n- 数据集：https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML\n\n**FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**\n\n- 主页：https://sdolivia.github.io/FineGym/\n- 论文：https://arxiv.org/abs/2004.06704\n\n**A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**\n\n- 主页：https://anyirao.com/projects/SceneSeg.html\n\n- 论文下载链接：https://arxiv.org/abs/2004.02678\n\n- 代码：https://github.com/AnyiRao/SceneSeg\n\n**Deep Homography Estimation for Dynamic Scenes**\n\n- 论文：https://arxiv.org/abs/2004.02132\n\n- 数据集：https://github.com/lcmhoang/hmg-dynamics\n\n**Assessing Image Quality Issues for Real-World Problems**\n\n- 主页：https://vizwiz.org/tasks-and-datasets/image-quality-issues/\n- 论文：https://arxiv.org/abs/2003.12511\n\n**UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**\n\n- 论文：https://arxiv.org/abs/2003.10608\n- 代码和数据集：https://github.com/Jyouhou/UnrealText/\n\n**PANDA: A Gigapixel-level Human-centric Video Dataset**\n\n- 论文：https://arxiv.org/abs/2003.04852\n\n- 数据集：http://www.panda-dataset.com/\n\n**IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning**\n\n- 论文：https://arxiv.org/abs/2003.02920\n- 数据集：https://github.com/intra3d2019/IntrA\n\n**Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**\n\n- 论文：https://arxiv.org/abs/2003.03972\n- 数据集：暂无\n\n<a name=\"Others\"></a>\n\n# 其他\n\n**CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus**\n\n- 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html\n- 代码：https://github.com/fkluger/consac\n\n**Learning to Learn Single Domain Generalization**\n\n- 论文：https://arxiv.org/abs/2003.13216\n- 代码：https://github.com/joffery/M-ADA\n\n**Open Compound Domain Adaptation**\n\n- 主页：https://liuziwei7.github.io/projects/CompoundDomain.html\n- 数据集：https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing\n- 论文：https://arxiv.org/abs/1909.03403\n- 代码：https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA\n\n**Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision**\n\n- 论文：http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf\n\n- 代码：https://github.com/autonomousvision/differentiable_volumetric_rendering\n\n**QEBA: Query-Efficient Boundary-Based Blackbox Attack**\n\n- 论文：https://arxiv.org/abs/2005.14137\n- 代码：https://github.com/AI-secure/QEBA\n\n**Equalization Loss for Long-Tailed Object Recognition**\n\n- 论文：https://arxiv.org/abs/2003.05176\n- 代码：https://github.com/tztztztztz/eql.detectron2\n\n**Instance-aware Image Colorization**\n\n- 主页：https://ericsujw.github.io/InstColorization/\n- 论文：https://arxiv.org/abs/2005.10825\n- 代码：https://github.com/ericsujw/InstColorization\n\n**Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting**\n\n- 论文：https://arxiv.org/abs/2005.09704\n\n- 代码：https://github.com/Atlas200dk/sample-imageinpainting-HiFill\n\n**Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching**\n\n- 论文：https://arxiv.org/abs/2005.03860\n- 代码：https://github.com/shiyujiao/cross_view_localization_DSM\n\n**Epipolar Transformers**\n\n- 论文：https://arxiv.org/abs/2005.04551\n\n- 代码：https://github.com/yihui-he/epipolar-transformers \n\n**Bringing Old Photos Back to Life**\n\n- 主页：http://raywzy.com/Old_Photo/\n- 论文：https://arxiv.org/abs/2004.09484\n\n**MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask**\n\n- 论文：https://arxiv.org/abs/2003.10955 \n\n- 代码：https://github.com/microsoft/MaskFlownet \n\n**Self-Supervised Viewpoint Learning from Image Collections**\n\n- 论文：https://arxiv.org/abs/2004.01793\n- 论文2：https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf \n- 代码：https://github.com/NVlabs/SSV \n\n**Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations**\n\n- Oral\n\n- 论文：https://arxiv.org/abs/2003.12237 \n- 代码：https://github.com/cuishuhao/BNM \n\n**Towards Learning Structure via Consensus for Face Segmentation and Parsing**\n\n- 论文：https://arxiv.org/abs/1911.00957\n- 代码：https://github.com/isi-vista/structure_via_consensus\n\n**Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging**\n\n- Oral\n- 论文：https://arxiv.org/abs/2003.13654\n\n- 代码：https://github.com/liuyang12/PnP-SCI\n\n**Lightweight Photometric Stereo for Facial Details Recovery**\n\n- 论文：https://arxiv.org/abs/2003.12307\n- 代码：https://github.com/Juyong/FacePSNet\n\n**Footprints and Free Space from a Single Color Image**\n\n- 论文：https://arxiv.org/abs/2004.06376\n\n- 代码：https://github.com/nianticlabs/footprints\n\n**Self-Supervised Monocular Scene Flow Estimation**\n\n- 论文：https://arxiv.org/abs/2004.04143\n- 代码：https://github.com/visinf/self-mono-sf\n\n**Quasi-Newton Solver for Robust Non-Rigid Registration**\n\n- 论文：https://arxiv.org/abs/2004.04322\n- 代码：https://github.com/Juyong/Fast_RNRR\n\n**A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**\n\n- 主页：https://anyirao.com/projects/SceneSeg.html\n\n- 论文下载链接：https://arxiv.org/abs/2004.02678\n\n- 代码：https://github.com/AnyiRao/SceneSeg\n\n**DeepFLASH: An Efficient Network for Learning-based Medical Image Registration**\n\n- 论文：https://arxiv.org/abs/2004.02097\n\n- 代码：https://github.com/jw4hv/deepflash\n\n**Self-Supervised Scene De-occlusion**\n\n- 主页：https://xiaohangzhan.github.io/projects/deocclusion/\n- 论文：https://arxiv.org/abs/2004.02788\n- 代码：https://github.com/XiaohangZhan/deocclusion\n\n**Polarized Reflection Removal with Perfect Alignment in the Wild** \n\n- 主页：https://leichenyang.weebly.com/project-polarized.html\n- 代码：https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment \n\n**Background Matting: The World is Your Green Screen**\n\n- 论文：https://arxiv.org/abs/2004.00626\n- 代码：http://github.com/senguptaumd/Background-Matting\n\n**What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective**\n\n- 论文：https://arxiv.org/abs/2003.11241\n\n- 代码：https://github.com/ZhangLi-CS/GCP_Optimization\n\n**Look-into-Object: Self-supervised Structure Modeling for Object Recognition**\n\n- 论文：暂无\n- 代码：https://github.com/JDAI-CV/LIO \n\n **Video Object Grounding using Semantic Roles in Language Description**\n\n- 论文：https://arxiv.org/abs/2003.10606\n- 代码：https://github.com/TheShadow29/vognet-pytorch \n\n**Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives**\n\n- 论文：https://arxiv.org/abs/2003.10739\n- 代码：https://github.com/d-li14/DHM \n\n**SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization**\n\n- 论文：http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf\n- 代码：https://github.com/YueJiang-nj/CVPR2020-SDFDiff \n\n**On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location**\n\n- 论文：https://arxiv.org/abs/2003.07064\n\n- 代码：https://github.com/oskyhn/CNNs-Without-Borders\n\n**GhostNet: More Features from Cheap Operations**\n\n- 论文：https://arxiv.org/abs/1911.11907\n\n- 代码：https://github.com/iamhankai/ghostnet\n\n**AdderNet: Do We Really Need Multiplications in Deep Learning?** \n\n- 论文：https://arxiv.org/abs/1912.13200 \n- 代码：https://github.com/huawei-noah/AdderNet\n\n**Deep Image Harmonization via Domain Verification** \n\n- 论文：https://arxiv.org/abs/1911.13239 \n- 代码：https://github.com/bcmi/Image_Harmonization_Datasets\n\n**Blurry Video Frame Interpolation**\n\n- 论文：https://arxiv.org/abs/2002.12259 \n- 代码：https://github.com/laomao0/BIN\n\n**Extremely Dense Point Correspondences using a Learned Feature Descriptor**\n\n- 论文：https://arxiv.org/abs/2003.00619 \n- 代码：https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch\n\n**Filter Grafting for Deep Neural Networks**\n\n- 论文：https://arxiv.org/abs/2001.05868\n- 代码：https://github.com/fxmeng/filter-grafting\n- 论文解读：https://www.zhihu.com/question/372070853/answer/1041569335\n\n**Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation**\n\n- 论文：https://arxiv.org/abs/2003.02824 \n- 代码：https://github.com/cmhungsteve/SSTDA\n\n**Detecting Attended Visual Targets in Video**\n\n- 论文：https://arxiv.org/abs/2003.02501 \n\n- 代码：https://github.com/ejcgt/attention-target-detection\n\n**Deep Image Spatial Transformation for Person Image Generation**\n\n- 论文：https://arxiv.org/abs/2003.00696 \n- 代码：https://github.com/RenYurui/Global-Flow-Local-Attention\n\n **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications** \n\n- 论文：https://arxiv.org/abs/2003.01455\n- 代码：https://github.com/bbrattoli/ZeroShotVideoClassification\n\nhttps://github.com/charlesCXK/3D-SketchAware-SSC\n\nhttps://github.com/Anonymous20192020/Anonymous_CVPR5767\n\nhttps://github.com/avirambh/ScopeFlow\n\nhttps://github.com/csbhr/CDVD-TSP\n\nhttps://github.com/ymcidence/TBH\n\nhttps://github.com/yaoyao-liu/mnemonics\n\nhttps://github.com/meder411/Tangent-Images\n\nhttps://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch\n\nhttps://github.com/sjmoran/deep_local_parametric_filters\n\nhttps://github.com/charlesCXK/3D-SketchAware-SSC\n\nhttps://github.com/bermanmaxim/AOWS\n\nhttps://github.com/dc3ea9f/look-into-object \n\n<a name=\"Not-Sure\"></a>\n\n# 不确定中没中\n\n**FADNet: A Fast and Accurate Network for Disparity Estimation**\n\n- 论文：还没出来\n- 代码：https://github.com/HKBU-HPML/FADNet\n\nhttps://github.com/rFID-submit/RandomFID：不确定中没中\n\nhttps://github.com/JackSyu/AE-MSR：不确定中没中\n\nhttps://github.com/fastconvnets/cvpr2020：不确定中没中\n\nhttps://github.com/aimagelab/meshed-memory-transformer：不确定中没中\n\nhttps://github.com/TWSFar/CRGNet：不确定中没中\n\nhttps://github.com/CVPR-2020/CDARTS：不确定中没中\n\nhttps://github.com/anucvml/ddn-cvprw2020：不确定中没中\n\nhttps://github.com/dl-model-recommend/model-trust：不确定中没中\n\nhttps://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior：不确定中没中\n\nhttps://github.com/onetcvpr/O-Net：不确定中没中\n\nhttps://github.com/502463708/Microcalcification_Detection：不确定中没中\n\nhttps://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine：不确定中没中\n\nhttps://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset：不确定中没中\n\nhttps://github.com/cvpr-nonrigid/dataset：不确定中没中\n\nhttps://github.com/theFool32/PPBA：不确定中没中\n\nhttps://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition"
  },
  {
    "path": "CVPR2021-Papers-with-Code.md",
    "content": "# CVPR 2021 论文和开源项目合集(Papers with Code)\n\n[CVPR 2021](http://cvpr2021.thecvf.com/) 论文和开源项目合集(papers with code)！\n\nCVPR 2021 收录列表：http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt\n\n> 注1：欢迎各位大佬提交issue，分享CVPR 2021论文和开源项目！\n>\n> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n\n如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ \n\n![](CVer学术交流群.png)\n\n## 【CVPR 2021 论文开源目录】\n\n- [Best Paper](#Best-Paper)\n- [Backbone](#Backbone)\n- [NAS](#NAS)\n- [GAN](#GAN)\n- [VAE](#VAE)\n- [Visual Transformer](#Visual-Transformer)\n- [Regularization](#Regularization)\n- [SLAM](#SLAM)\n- [长尾分布(Long-Tailed)](#Long-Tailed)\n- [数据增广(Data Augmentation)](#DA)\n- [无监督/自监督(Self-Supervised)](#Un/Self-Supervised)\n- [半监督(Semi-Supervised)](#Semi-Supervised)\n- [胶囊网络(Capsule Network)](#Capsule-Network)\n- [图像分类(Image Classification](#Image-Classification)\n- [2D目标检测(Object Detection)](#Object-Detection)\n- [单/多目标跟踪(Object Tracking)](#Object-Tracking)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation)\n- [视频目标分割(Video-Object-Segmentation)](#VOS)\n- [交互式视频目标分割(Interactive-Video-Object-Segmentation)](#IVOS)\n- [显著性检测(Saliency Detection)](#Saliency-Detection)\n- [伪装物体检测(Camouflaged Object Detection)](#Camouflaged-Object-Detection)\n- [协同显著性检测(Co-Salient Object Detection)](#CoSOD)\n- [图像抠图(Image Matting)](#Matting)\n- [行人重识别(Person Re-identification)](#Re-ID)\n- [行人搜索(Person Search)](#Person-Search)\n- [视频理解/行为识别(Video Understanding)](#Video-Understanding)\n- [人脸识别(Face Recognition)](#Face-Recognition)\n- [人脸检测(Face Detection)](#Face-Detection)\n- [人脸活体检测(Face Anti-Spoofing)](#Face-Anti-Spoofing)\n- [Deepfake检测(Deepfake Detection)](#Deepfake-Detection)\n- [人脸年龄估计(Age-Estimation)](#Age-Estimation)\n- [人脸表情识别(Facial-Expression-Recognition)](#FER)\n- [Deepfakes](#Deepfakes)\n- [人体解析(Human Parsing)](#Human-Parsing)\n- [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation)\n- [动物姿态估计(Animal Pose Estimation)](#Animal-Pose-Estimation)\n- [手部姿态估计(Hand Pose Estimation)](#Hand-Pose-Estimation)\n- [Human Volumetric Capture](#Human-Volumetric-Capture)\n- [场景文本识别(Scene Text Recognition)](#Scene-Text-Recognition)\n- [图像压缩(Image Compression)](#Image-Compression)\n- [模型压缩/剪枝/量化](#Model-Compression)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [超分辨率(Super-Resolution)](#Super-Resolution)\n- [去雾(Dehazing)](#Dehazing)\n- [图像恢复(Image Restoration)](#Image-Restoration)\n- [图像补全(Image Inpainting)](#Image-Inpainting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [字体生成(Font Generation)](#Font-Generation)\n- [图像匹配(Image Matching)](#Image-Matching)\n- [图像融合(Image Blending)](#Image-Blending)\n- [反光去除(Reflection Removal)](#Reflection-Removal)\n- [3D点云分类(3D Point Clouds Classification)](#3D-C)\n- [3D目标检测(3D Object Detection)](#3D-Object-Detection)\n- [3D语义分割(3D Semantic Segmentation)](#3D-Semantic-Segmentation)\n- [3D全景分割(3D Panoptic Segmentation)](#3D-Panoptic-Segmentation)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D点云配准(3D Point Cloud Registration)](#3D-PointCloud-Registration)\n- [3D点云补全(3D-Point-Cloud-Completion)](#3D-Point-Cloud-Completion)\n- [3D重建(3D Reconstruction)](#3D-Reconstruction)\n- [6D位姿估计(6D Pose Estimation)](#6D-Pose-Estimation)\n- [相机姿态估计(Camera Pose Estimation)](#Camera-Pose-Estimation)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [光流估计(Flow Estimation)](#Flow-Estimation)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction)\n- [人群计数(Crowd Counting)](#Crowd-Counting)\n- [对抗样本(Adversarial-Examples)](#AE)\n- [图像检索(Image Retrieval)](#Image-Retrieval)\n- [视频检索(Video Retrieval)](#Video-Retrieval)\n- [跨模态检索(Cross-modal Retrieval)](#Cross-modal-Retrieval) \n- [Zero-Shot Learning](#Zero-Shot-Learning)\n- [联邦学习(Federated Learning)](#Federated-Learning)\n- [视频插帧(Video Frame Interpolation)](#Video-Frame-Interpolation)\n- [视觉推理(Visual Reasoning)](#Visual-Reasoning)\n- [图像合成(Image Synthesis)](#Image-Synthesis)\n- [视图合成(Visual Synthesis)](#Visual-Synthesis)\n- [风格迁移(Style Transfer)](#Style-Transfer)\n- [布局生成(Layout Generation)](#Layout-Generation)\n- [Domain Generalization](#Domain-Generalization)\n- [Domain Adaptation](#Domain-Adaptation)\n- [Open-Set](#Open-Set)\n- [Adversarial Attack](#Adversarial-Attack)\n- [\"人-物\"交互(HOI)检测](#HOI)\n- [阴影去除(Shadow Removal)](#Shadow-Removal)\n- [虚拟试衣(Virtual Try-On)](#Virtual-Try-On)\n- [标签噪声(Label Noise)](#Label-Noise)\n- [视频稳像(Video Stabilization)](#Video-Stabilization)\n- [数据集(Datasets)](#Datasets)\n- [其他(Others)](#Others)\n- [待添加(TODO)](#TO-DO)\n- [不确定中没中(Not Sure)](#Not-Sure)\n\n<a name=\"Best-Paper\"></a>\n\n# Best Paper\n\n**GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields**\n\n- Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html\n- Paper(Oral): https://arxiv.org/abs/2011.12100\n\n- Code: https://github.com/autonomousvision/giraffe\n\n- Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1\n\n<a name=\"Backbone\"></a>\n\n# Backbone\n\n**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers**\n\n- Paper(Oral): https://arxiv.org/abs/2106.06560\n\n- Code: https://github.com/dingmyu/HR-NAS\n\n**BCNet: Searching for Network Width with Bilaterally Coupled Network**\n\n- Paper: https://arxiv.org/abs/2105.10533\n- Code: None\n\n**Decoupled Dynamic Filter Networks**\n\n- Homepage: https://thefoxofsky.github.io/project_pages/ddf\n- Paper: https://arxiv.org/abs/2104.14107\n- Code: https://github.com/thefoxofsky/DDF\n\n**Lite-HRNet: A Lightweight High-Resolution Network**\n\n- Paper: https://arxiv.org/abs/2104.06403\n- https://github.com/HRNet/Lite-HRNet\n\n**CondenseNet V2: Sparse Feature Reactivation for Deep Networks**\n\n- Paper: https://arxiv.org/abs/2104.04382\n\n- Code: https://github.com/jianghaojun/CondenseNetV2\n\n**Diverse Branch Block: Building a Convolution as an Inception-like Unit**\n\n- Paper: https://arxiv.org/abs/2103.13425\n\n- Code: https://github.com/DingXiaoH/DiverseBranchBlock\n\n**Scaling Local Self-Attention For Parameter Efficient Visual Backbones**\n\n- Paper(Oral): https://arxiv.org/abs/2103.12731\n\n- Code: None\n\n**ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network**\n\n- Paper: https://arxiv.org/abs/2007.00992\n- Code:  https://github.com/clovaai/rexnet\n\n**Involution: Inverting the Inherence of Convolution for Visual Recognition**\n\n- Paper: https://github.com/d-li14/involution\n- Code: https://arxiv.org/abs/2103.06255\n\n**Coordinate Attention for Efficient Mobile Network Design**\n\n- Paper:  https://arxiv.org/abs/2103.02907\n- Code: https://github.com/Andrew-Qibin/CoordAttention\n\n**Inception Convolution with Efficient Dilation Search**\n\n- Paper:  https://arxiv.org/abs/2012.13587 \n- Code: https://github.com/yifan123/IC-Conv\n\n**RepVGG: Making VGG-style ConvNets Great Again**\n\n- Paper: https://arxiv.org/abs/2101.03697\n- Code: https://github.com/DingXiaoH/RepVGG\n\n<a name=\"NAS\"></a>\n\n# NAS\n\n**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers**\n\n- Paper(Oral): https://arxiv.org/abs/2106.06560\n\n- Code: https://github.com/dingmyu/HR-NAS\n\n**BCNet: Searching for Network Width with Bilaterally Coupled Network**\n\n- Paper: https://arxiv.org/abs/2105.10533\n- Code: None\n\n**ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search**\n\n- Paper: ttps://arxiv.org/abs/2105.10154\n- Code: None\n\n**Combined Depth Space based Architecture Search For Person Re-identification**\n\n- Paper: https://arxiv.org/abs/2104.04163\n- Code: None\n\n**DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation**\n\n- Paper(Oral): https://arxiv.org/abs/2103.15954\n- Code: None\n\n**HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers**\n\n- Paper(Oral): None\n- Code: https://github.com/dingmyu/HR-NAS\n\n**Neural Architecture Search with Random Labels**\n\n- Paper: https://arxiv.org/abs/2101.11834\n- Code: None\n\n**Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search**\n\n- Paper: https://arxiv.org/abs/2101.11342\n- Code: None\n\n**Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation**\n\n- Paper:  https://arxiv.org/abs/2105.12971 \n- Code: None\n\n**Prioritized Architecture Sampling with Monto-Carlo Tree Search**\n\n- Paper: https://arxiv.org/abs/2103.11922\n- Code: https://github.com/xiusu/NAS-Bench-Macro\n\n**Contrastive Neural Architecture Search with Neural Architecture Comparators**\n\n- Paper: https://arxiv.org/abs/2103.05471\n- Code: https://github.com/chenyaofo/CTNAS\n\n**AttentiveNAS: Improving Neural Architecture Search via Attentive** \n\n- Paper: https://arxiv.org/abs/2011.09011\n- Code: None\n\n**ReNAS: Relativistic Evaluation of Neural Architecture Search**\n\n- Paper: https://arxiv.org/abs/1910.01523\n- Code: None\n\n**HourNAS: Extremely Fast Neural Architecture**\n\n- Paper: https://arxiv.org/abs/2005.14446\n- Code: None\n\n**Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator**\n\n- Paper: https://arxiv.org/abs/2103.07289\n- Code: https://github.com/eric8607242/SGNAS\n\n**OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection**\n\n- Paper: https://arxiv.org/abs/2103.04507\n- Code: https://github.com/VDIGPKU/OPANAS\n\n**Inception Convolution with Efficient Dilation Search**\n\n- Paper:  https://arxiv.org/abs/2012.13587 \n- Code: None\n\n<a name=\"GAN\"></a>\n\n# GAN\n\n**High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network**\n\n- Paper: https://arxiv.org/abs/2105.09188\n- Code: https://github.com/csjliang/LPTN\n- Dataset: https://github.com/csjliang/LPTN\n\n**DG-Font: Deformable Generative Networks for Unsupervised Font Generation**\n\n- Paper: https://arxiv.org/abs/2104.03064\n\n- Code: https://github.com/ecnuycxie/DG-Font\n\n**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**\n\n- Paper: https://arxiv.org/abs/2105.02201\n- Code: https://github.com/KumapowerLIU/PD-GAN\n\n**StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**\n\n- Paper: https://arxiv.org/abs/2104.14754\n- Code: https://github.com/naver-ai/StyleMapGAN\n- Demo Video: https://youtu.be/qCapNyRA_Ng\n\n**Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer**\n\n- Paper: https://arxiv.org/abs/2104.05376\n- Code: https://github.com/PaddlePaddle/PaddleGAN/\n\n**Regularizing Generative Adversarial Networks under Limited Data**\n\n- Homepage: https://hytseng0509.github.io/lecam-gan/\n- Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf\n- Code: https://github.com/google/lecam-gan\n\n**Towards Real-World Blind Face Restoration with Generative Facial Prior**\n\n- Paper: https://arxiv.org/abs/2101.04061\n- Code: None\n\n**TediGAN: Text-Guided Diverse Image Generation and Manipulation**\n\n- Homepage: https://xiaweihao.com/projects/tedigan/\n\n- Paper: https://arxiv.org/abs/2012.03308\n- Code: https://github.com/weihaox/TediGAN\n\n**Generative Hierarchical Features from Synthesizing Image**\n\n- Homepage: https://genforce.github.io/ghfeat/\n\n- Paper(Oral): https://arxiv.org/abs/2007.10379\n- Code: https://github.com/genforce/ghfeat\n\n**Teachers Do More Than Teach: Compressing Image-to-Image Models**\n\n- Paper: https://arxiv.org/abs/2103.03467\n- Code: https://github.com/snap-research/CAT\n\n**HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms**\n\n- Paper: https://arxiv.org/abs/2011.11731\n- Code: https://github.com/mahmoudnafifi/HistoGAN\n\n**pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis**\n\n- Homepage: https://marcoamonteiro.github.io/pi-GAN-website/\n\n- Paper(Oral): https://arxiv.org/abs/2012.00926\n- Code: None\n\n**DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network**\n\n- Paper: https://arxiv.org/abs/2103.07893\n- Code: None\n\n**Diverse Semantic Image Synthesis via Probability Distribution Modeling**\n\n- Paper: https://arxiv.org/abs/2103.06878\n- Code: https://github.com/tzt101/INADE.git\n\n**LOHO: Latent Optimization of Hairstyles via Orthogonalization**\n\n- Paper: https://arxiv.org/abs/2103.03891\n- Code: None\n\n**PISE: Person Image Synthesis and Editing with Decoupled GAN**\n\n- Paper: https://arxiv.org/abs/2103.04023\n- Code: https://github.com/Zhangjinso/PISE\n\n**DeFLOCNet: Deep Image Editing via Flexible Low-level Controls**\n\n- Paper: http://raywzy.com/\n- Code: http://raywzy.com/\n\n**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**\n\n- Paper: http://raywzy.com/\n- Code: http://raywzy.com/\n\n**Efficient Conditional GAN Transfer with Knowledge Propagation across Classes**\n\n- Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes\n- Code: http://github.com/mshahbazi72/cGANTransfer\n\n**Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**\n\n- Paper: None\n- Code: None\n\n**Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs**\n\n- Paper: https://arxiv.org/abs/2011.14107\n- Code: None\n\n**Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation**\n\n- Homepage: https://eladrich.github.io/pixel2style2pixel/\n- Paper: https://arxiv.org/abs/2008.00951\n- Code: https://github.com/eladrich/pixel2style2pixel\n\n**A 3D GAN for Improved Large-pose Facial Recognition**\n\n- Paper: https://arxiv.org/abs/2012.10545\n- Code: None\n\n**HumanGAN: A Generative Model of Humans Images**\n\n- Paper: https://arxiv.org/abs/2103.06902\n- Code: None\n\n**ID-Unet: Iterative Soft and Hard Deformation for View Synthesis**\n\n- Paper: https://arxiv.org/abs/2103.02264\n- Code: https://github.com/MingyuY/Iterative-view-synthesis\n\n**CoMoGAN: continuous model-guided image-to-image translation**\n\n- Paper(Oral): https://arxiv.org/abs/2103.06879\n- Code: https://github.com/cv-rits/CoMoGAN\n\n**Training Generative Adversarial Networks in One Stage**\n\n- Paper: https://arxiv.org/abs/2103.00430\n- Code: None\n\n**Closed-Form Factorization of Latent Semantics in GANs**\n\n- Homepage: https://genforce.github.io/sefa/\n- Paper(Oral): https://arxiv.org/abs/2007.06600\n- Code: https://github.com/genforce/sefa\n\n**Anycost GANs for Interactive Image Synthesis and Editing**\n\n- Paper: https://arxiv.org/abs/2103.03243\n- Code: https://github.com/mit-han-lab/anycost-gan\n\n**Image-to-image Translation via Hierarchical Style Disentanglement**\n\n- Paper: https://arxiv.org/abs/2103.01456\n- Code: https://github.com/imlixinyang/HiSD\n\n<a name=\"VAE\"></a>\n\n# VAE\n\n**Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders**\n\n- Homepage: https://taldatech.github.io/soft-intro-vae-web/\n\n- Paper: https://arxiv.org/abs/2012.13253\n- Code: https://github.com/taldatech/soft-intro-vae-pytorch\n\n<a name=\"Visual Transformer\"></a>\n\n# Visual Transformer\n\n**1. End-to-End Human Pose and Mesh Reconstruction with Transformers**\n\n- Paper: https://arxiv.org/abs/2012.09760\n- Code: https://github.com/microsoft/MeshTransformer\n\n**2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition**\n\n- Paper: https://arxiv.org/abs/2101.06184\n- Code: https://github.com/tobyperrett/trx\n\n**3. Kaleido-BERT：Vision-Language Pre-training on Fashion Domain**\n\n- Paper: https://arxiv.org/abs/2103.16110\n- Code: https://github.com/mczhuge/Kaleido-BERT\n\n**4. HOTR: End-to-End Human-Object Interaction Detection with Transformers**\n\n- Paper: https://arxiv.org/abs/2104.13682\n- Code: https://github.com/kakaobrain/HOTR\n\n**5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving**\n\n- Paper: https://arxiv.org/abs/2104.09224\n- Code: https://github.com/autonomousvision/transfuser\n\n**6. Pose Recognition with Cascade Transformers**\n\n- Paper: https://arxiv.org/abs/2104.06976\n\n- Code: https://github.com/mlpc-ucsd/PRTR\n\n**7. Variational Transformer Networks for Layout Generation**\n\n- Paper: https://arxiv.org/abs/2104.02416\n- Code: None\n\n**8. LoFTR: Detector-Free Local Feature Matching with Transformers**\n\n- Homepage: https://zju3dv.github.io/loftr/\n- Paper: https://arxiv.org/abs/2104.00680\n- Code: https://github.com/zju3dv/LoFTR\n\n**9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers**\n\n- Paper: https://arxiv.org/abs/2012.15840\n- Code: https://github.com/fudan-zvg/SETR\n\n**10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers**\n\n- Paper: https://arxiv.org/abs/2103.16553\n- Code: None\n\n**11. Transformer Tracking**\n\n- Paper: https://arxiv.org/abs/2103.15436\n- Code: https://github.com/chenxin-dlut/TransT\n\n**12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers**\n\n- Paper(Oral):  https://arxiv.org/abs/2106.06560 \n- Code: https://github.com/dingmyu/HR-NAS\n\n**13. MIST: Multiple Instance Spatial Transformer**\n\n- Paper: https://arxiv.org/abs/1811.10725\n- Code: None\n\n**14. Multimodal Motion Prediction with Stacked Transformers**\n\n- Paper: https://arxiv.org/abs/2103.11624\n- Code: https://decisionforce.github.io/mmTransformer\n\n**15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning**\n\n- Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning\n\n- Code: https://github.com/amzn/image-to-recipe-transformers\n\n**16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking**\n\n- Paper(Oral): https://arxiv.org/abs/2103.11681\n\n- Code: https://github.com/594422814/TransformerTrack\n\n**17. Pre-Trained Image Processing Transformer**\n\n- Paper:  https://arxiv.org/abs/2012.00364 \n- Code: None\n\n**18. End-to-End Video Instance Segmentation with Transformers**\n\n- Paper(Oral): https://arxiv.org/abs/2011.14503\n- Code: https://github.com/Epiphqny/VisTR\n\n**19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers**\n\n- Paper(Oral): https://arxiv.org/abs/2011.09094\n- Code: https://github.com/dddzg/up-detr\n\n**20. End-to-End Human Object Interaction Detection with HOI Transformer**\n\n- Paper: https://arxiv.org/abs/2103.04503\n- Code: https://github.com/bbepoch/HoiTransformer\n\n**21. Transformer Interpretability Beyond Attention Visualization** \n\n- Paper: https://arxiv.org/abs/2012.09838\n- Code: https://github.com/hila-chefer/Transformer-Explainability\n\n**22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer**\n\n- Paper: None\n- Code: None\n\n**23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity**\n\n- Paper: None\n- Code: None\n\n**24. Line Segment Detection Using Transformers without Edges**\n\n- Paper(Oral): https://arxiv.org/abs/2101.01909\n- Code: None\n\n**25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html\n- Code: None\n\n**26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation**\n\n- Paper(Oral): https://arxiv.org/abs/2101.08833\n- Code: https://github.com/dukebw/SSTVOS\n\n**27. Facial Action Unit Detection With Transformers**\n\n- Paper: None\n- Code: None\n\n**28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition**\n\n- Paper: None\n- Code: None\n\n**29. Lesion-Aware Transformers for Diabetic Retinopathy Grading**\n\n- Paper: None\n- Code: None\n\n**30. Topological Planning With Transformers for Vision-and-Language Navigation**\n\n- Paper: https://arxiv.org/abs/2012.05292\n- Code: None\n\n**31. Adaptive Image Transformer for One-Shot Object Detection**\n\n- Paper: None\n- Code: None\n\n**32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos**\n\n- Paper: None\n- Code: None\n\n**33. Taming Transformers for High-Resolution Image Synthesis**\n\n- Homepage: https://compvis.github.io/taming-transformers/\n- Paper(Oral): https://arxiv.org/abs/2012.09841\n- Code: https://github.com/CompVis/taming-transformers\n\n**34. Self-Supervised Video Hashing via Bidirectional Transformers**\n\n- Paper: None\n- Code: None\n\n**35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos**\n\n- Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf\n- Code: None\n\n**36. Gaussian Context Transformer**\n\n- Paper: None\n- Code: None\n\n**37. General Multi-Label Image Classification With Transformers**\n\n- Paper: https://arxiv.org/abs/2011.14027\n- Code: None\n\n**38. Bottleneck Transformers for Visual Recognition**\n\n- Paper: https://arxiv.org/abs/2101.11605\n- Code: None\n\n**39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation**\n\n- Paper(Oral): https://arxiv.org/abs/2011.13922\n- Code: https://github.com/YicongHong/Recurrent-VLN-BERT\n\n**40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling**\n\n- Paper(Oral): https://arxiv.org/abs/2102.06183\n- Code: https://github.com/jayleicn/ClipBERT\n\n**41. Self-attention based Text Knowledge Mining for Text Detection**\n\n- Paper: None\n- Code: https://github.com/CVI-SZU/STKM\n\n**42. SSAN: Separable Self-Attention Network for Video Representation Learning**\n\n- Paper: None\n- Code: None\n\n**43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones**\n\n- Paper(Oral): https://arxiv.org/abs/2103.12731\n\n- Code: None\n\n<a name=\"Regularization\"></a>\n\n# Regularization\n\n**Regularizing Neural Networks via Adversarial Model Perturbation**\n\n- Paper: https://arxiv.org/abs/2010.04925\n- Code: https://github.com/hiyouga/AMP-Regularizer\n\n<a name=\"SLAM\"></a>\n\n# SLAM\n\n**Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation**\n\n- Paper: https://arxiv.org/abs/2105.07593\n- Code: None\n\n**Generalizing to the Open World: Deep Visual Odometry with Online Adaptation**\n\n- Paper: https://arxiv.org/abs/2103.15279\n- Code: https://arxiv.org/abs/2103.15279\n\n<a name=\"Long-Tailed\"></a>\n\n# 长尾分布(Long-Tailed)\n\n**Adversarial Robustness under Long-Tailed Distribution**\n\n- Paper(Oral): https://arxiv.org/abs/2104.02703\n- Code: https://github.com/wutong16/Adversarial_Long-Tail \n\n**Distribution Alignment: A Unified Framework for Long-tail Visual Recognition**\n\n- Paper: https://arxiv.org/abs/2103.16370\n- Code: https://github.com/Megvii-BaseDetection/DisAlign\n\n**Adaptive Class Suppression Loss for Long-Tail Object Detection**\n\n- Paper: https://arxiv.org/abs/2104.00885\n- Code: https://github.com/CASIA-IVA-Lab/ACSL\n\n**Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification**\n\n- Paper: https://arxiv.org/abs/2103.14267\n- Code: None\n\n<a name=\"DA\"></a>\n\n# 数据增广(Data Augmentation)\n\n**Scale-aware Automatic Augmentation for Object Detection**\n\n- Paper: https://arxiv.org/abs/2103.17220\n\n- Code: https://github.com/Jia-Research-Lab/SA-AutoAug\n\n<a name=\"Un/Self-Supervised\"></a>\n\n# 无监督/自监督(Un/Self-Supervised)\n\n**Domain-Specific Suppression for Adaptive Object Detection**\n\n- Paper: https://arxiv.org/abs/2105.03570\n- Code: None\n\n**A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning**\n\n- Paper: https://arxiv.org/abs/2104.14558\n\n- Code: https://github.com/facebookresearch/SlowFast\n\n**Unsupervised Multi-Source Domain Adaptation for Person Re-Identification**\n\n- Paper: https://arxiv.org/abs/2104.12961\n- Code: None\n\n**Self-supervised Video Representation Learning by Context and Motion Decoupling**\n\n- Paper: https://arxiv.org/abs/2104.00862\n- Code: None\n\n**Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning**\n\n- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html\n- Paper: https://arxiv.org/abs/2009.05769\n- Code: https://github.com/FingerRec/BE\n\n**Spatially Consistent Representation Learning**\n\n- Paper: https://arxiv.org/abs/2103.06122\n- Code: None\n\n**VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples**\n\n- Paper: https://arxiv.org/abs/2103.05905\n- Code: https://github.com/tinapan-pt/VideoMoCo\n\n**Exploring Simple Siamese Representation Learning**\n\n- Paper(Oral): https://arxiv.org/abs/2011.10566\n- Code: None\n\n**Dense Contrastive Learning for Self-Supervised Visual Pre-Training**\n\n- Paper(Oral): https://arxiv.org/abs/2011.09157\n- Code: https://github.com/WXinlong/DenseCL\n\n<a name=\"Semi-Supervised\"></a>\n\n# 半监督学习(Semi-Supervised )\n\n**Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework**\n\n- 作者单位: 阿里巴巴\n\n- Paper: https://arxiv.org/abs/2103.11402\n- Code: None\n\n**Adaptive Consistency Regularization for Semi-Supervised Transfer Learning**\n\n- Paper: https://arxiv.org/abs/2103.02193\n- Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning\n\n<a name=\"Capsule-Network\"></a>\n\n# 胶囊网络(Capsule Network)\n\n**Capsule Network is Not More Robust than Convolutional Network**\n\n- Paper: https://arxiv.org/abs/2103.15459\n- Code: None\n\n<a name=\"Image-Classification\"></a>\n\n# 图像分类(Image Classification)\n\n**Correlated Input-Dependent Label Noise in Large-Scale Image Classification**\n\n- Paper(Oral): https://arxiv.org/abs/2105.10305\n- Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet\n\n<a name=\"Object-Detection\"></a>\n\n# 2D目标检测(Object Detection)\n\n## 2D目标检测\n\n**1. Scaled-YOLOv4: Scaling Cross Stage Partial Network**\n\n- 作者单位: 中央研究院, 英特尔, 静宜大学\n- Paper: https://arxiv.org/abs/2011.08036\n- Code: https://github.com/WongKinYiu/ScaledYOLOv4\n- 中文解读: [YOLOv4官方改进版来了！55.8% AP！速度最高达1774 FPS，Scaled-YOLOv4正式开源！](https://mp.weixin.qq.com/s/AcrJPNoAVhn8cGBUGK7ekA)\n\n**2. You Only Look One-level Feature**\n\n- 作者单位: 中科院, 国科大, 旷视科技\n- Paper: https://arxiv.org/abs/2103.09460\n- Code: https://github.com/megvii-model/YOLOF\n- 中文解读: [CVPR 2021 | 没有FPN！中科院&旷视提出YOLOF：你只需看一层特征](https://mp.weixin.qq.com/s/EJqAG1gTVaP2icI6QL742A)\n\n**3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals**\n\n- 作者单位: 香港大学, 同济大学, 字节跳动AI Lab, 加利福尼亚大学伯克利分校\n- Paper: https://arxiv.org/abs/2011.12450\n- Code: https://github.com/PeizeSun/SparseR-CNN\n- 中文解读: [目标检测新范式！港大同济伯克利提出Sparse R-CNN，代码刚刚开源！](https://mp.weixin.qq.com/s/P2Zgh1wTqf8L2976El5nfQ)\n\n**4. End-to-End Object Detection with Fully Convolutional Network**\n\n- 作者单位: 旷视科技, 西安交通大学\n- Paper: https://arxiv.org/abs/2012.03544\n- Code: https://github.com/Megvii-BaseDetection/DeFCN\n\n**5. Dynamic Head: Unifying Object Detection Heads with Attentions**\n\n- 作者单位: 微软\n- Paper: https://arxiv.org/abs/2106.08322\n- Code: https://github.com/microsoft/DynamicHead\n- 中文解读: [60.6 AP！打破COCO记录！微软提出DyHead：将注意力与目标检测Heads统一](https://mp.weixin.qq.com/s/uYPUqVXwNau71VAYW3bYIA)\n\n**6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection**\n\n- 作者单位: 南京理工大学, Momenta, 南京大学, 清华大学\n- Paper: https://arxiv.org/abs/2011.12885\n- Code: https://github.com/implus/GFocalV2\n- 中文解读：[CVPR 2021 | GFLV2：目标检测良心技术，无Cost涨点！](https://mp.weixin.qq.com/s/JB7k3NwXU-cDueg6w9mghQ)\n\n**7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers**\n\n- 作者单位: 华南理工大学, 腾讯微信AI\n- Paper(Oral): https://arxiv.org/abs/2011.09094\n- Code: https://github.com/dddzg/up-detr\n- 中文解读: [CVPR 2021 Oral | Transformer再发力！华南理工和微信提出UP-DETR：无监督预训练检测器](https://mp.weixin.qq.com/s/Hprp7B16SGFhVEKXfKiRBQ)\n\n**8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators**\n\n- 作者单位: 威斯康星大学, 谷歌\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf\n- Code: https://github.com/tensorflow/models/tree/master/research/object_detection\n\n**9. Tracking Pedestrian Heads in Dense Crowd**\n\n- 作者单位: 雷恩第一大学\n- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html\n- Code1: https://github.com/Sentient07/HeadHunter\n- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T\n- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/\n\n**10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation**\n\n- 作者单位: 香港科技大学, 华为诺亚\n- Paper:  https://arxiv.org/abs/2105.12971 \n- Code: None\n\n**11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery**\n\n- 作者单位: A*star, 四川大学,  南洋理工大学\n- Paper: https://arxiv.org/abs/2105.12990\n- Code: None\n\n**12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection**\n\n- 作者单位: 旷视科技\n- Paper: https://arxiv.org/abs/2104.06936\n- Code: None\n\n**13. Multi-Scale Aligned Distillation for Low-Resolution Detection**\n\n- 作者单位: 香港中文大学, Adobe研究院, 思谋科技\n- Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf\n- Code: https://github.com/Jia-Research-Lab/MSAD\n\n**14. Adaptive Class Suppression Loss for Long-Tail Object Detection**\n\n- 作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise\n\n- Paper: https://arxiv.org/abs/2104.00885\n- Code: https://github.com/CASIA-IVA-Lab/ACSL\n\n**15. VarifocalNet: An IoU-aware Dense Object Detector**\n\n- 作者单位: 昆士兰科技大学, 昆士兰大学\n- Paper(Oral): https://arxiv.org/abs/2008.13367\n- Code: https://github.com/hyz-xmaster/VarifocalNet\n\n**16. OTA: Optimal Transport Assignment for Object Detection**\n\n- 作者单位: 早稻田大学, 旷视科技\n\n- Paper: https://arxiv.org/abs/2103.14259\n- Code: https://github.com/Megvii-BaseDetection/OTA\n\n**17. Distilling Object Detectors via Decoupled Features**\n\n- 作者单位: 华为诺亚, 悉尼大学\n- Paper: https://arxiv.org/abs/2103.14475\n- Code: https://github.com/ggjy/DeFeat.pytorch\n\n**18. Robust and Accurate Object Detection via Adversarial Learning**\n\n- 作者单位: 谷歌, UCLA, UCSC\n\n- Paper: https://arxiv.org/abs/2103.13886\n\n- Code: None\n\n**19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection**\n\n- 作者单位: 北京大学, Anyvision, 石溪大学\n- Paper: https://arxiv.org/abs/2103.04507\n- Code: https://github.com/VDIGPKU/OPANAS\n\n**20. Multiple Instance Active Learning for Object Detection**\n\n- 作者单位: 国科大, 华为诺亚, 清华大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.pdf\n- Code: https://github.com/yuantn/MI-AOD\n\n**21. Towards Open World Object Detection**\n\n- 作者单位: 印度理工学院, MBZUAI, 澳大利亚国立大学, 林雪平大学\n- Paper(Oral): https://arxiv.org/abs/2103.02603\n- Code: https://github.com/JosephKJ/OWOD\n\n**22. RankDetNet: Delving Into Ranking Constraints for Object Detection**\n\n- 作者单位: 赛灵思\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_RankDetNet_Delving_Into_Ranking_Constraints_for_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n## 旋转目标检测\n\n**23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection**\n\n- 作者单位: 上海交通大学, 国科大\n- Paper: https://arxiv.org/abs/2011.09670\n- Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow\n- Code2: https://github.com/yangxue0827/RotationDetection \n\n**24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection**\n\n- 作者单位: 武汉大学\n\n- Paper: https://arxiv.org/abs/2103.07733\n- Code: https://github.com/csuhan/ReDet\n\n**25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection**\n\n- 作者单位: 国科大, 清华大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Beyond_Bounding-Box_Convex-Hull_Feature_Adaptation_for_Oriented_and_Densely_Packed_CVPR_2021_paper.html\n- Code: https://github.com/SDL-GuoZonghao/BeyondBoundingBox\n\n## Few-Shot目标检测\n\n**26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss**\n\n- 作者单位: 复旦大学, 同济大学, 浙江大学\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html\n- Code: None\n\n**27. Adaptive Image Transformer for One-Shot Object Detection**\n\n- 作者单位: 中央研究院, 台湾AI Labs \n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Adaptive_Image_Transformer_for_One-Shot_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection**\n\n- 作者单位: 北京大学, 北邮\n- Paper: https://arxiv.org/abs/2103.17115\n- Code: https://github.com/hzhupku/DCNet \n\n**29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection**\n\n- 作者单位: 卡内基梅隆大学(CMU)\n\n- Paper: https://arxiv.org/abs/2103.01903\n- Code: None\n\n**30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding**\n\n- 作者单位: 南加利福尼亚大学, 旷视科技\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sun_FSCE_Few-Shot_Object_Detection_via_Contrastive_Proposal_Encoding_CVPR_2021_paper.html\n- Code:  https://github.com/MegviiDetection/FSCE \n\n**31. Hallucination Improves Few-Shot Object Detection**\n\n- 作者单位: 伊利诺伊大学厄巴纳-香槟分校\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Hallucination_Improves_Few-Shot_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/pppplin/HallucFsDet\n\n**32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment**\n\n- 作者单位: 新加坡国立大学, SIMTech\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Few-Shot_Object_Detection_via_Classification_Refinement_and_Distractor_Retreatment_CVPR_2021_paper.html\n- Code: None\n\n**33. Generalized Few-Shot Object Detection Without Forgetting**\n\n- 作者单位: 旷视科技\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Fan_Generalized_Few-Shot_Object_Detection_Without_Forgetting_CVPR_2021_paper.html\n- Code: None\n\n**34. Transformation Invariant Few-Shot Object Detection**\n\n- 作者单位: 华为诺亚方舟实验室\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation**\n\n- 作者单位: 不列颠哥伦比亚大学, Vector AI, CIFAR AI Chair\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Khandelwal_UniT_Unified_Knowledge_Transfer_for_Any-Shot_Object_Detection_and_Segmentation_CVPR_2021_paper.html\n- Code: https://github.com/ubc-vision/UniT\n\n**36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection**\n\n- 作者单位: 国科大, 厦门大学, 鹏城实验室\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Beyond_Max-Margin_Class_Margin_Equilibrium_for_Few-Shot_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/Bohao-Lee/CME\n\n## 半监督目标检测\n\n **37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]**\n\n- 作者单位: 旷视科技, 复旦大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Points_As_Queries_Weakly_Semi-Supervised_Object_Detection_by_Points_CVPR_2021_paper.html\n- Code: None\n\n**38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection**\n\n- 作者单位: 清华大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Data-Uncertainty_Guided_Multi-Phase_Learning_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**39. Positive-Unlabeled Data Purification in the Wild for Object Detection**\n\n- 作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection**\n\n- 作者单位: 阿里巴巴, 香港理工大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Interactive_Self-Training_With_Mean_Teachers_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework**\n\n- 作者单位: 阿里巴巴\n- Paper: https://arxiv.org/abs/2103.11402\n- Code: None\n\n**42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection**\n\n- 作者单位:  卡内基梅隆大学(CMU), 亚马逊\n- Homepage: https://yihet.com/humble-teacher\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tang_Humble_Teachers_Teach_Better_Students_for_Semi-Supervised_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/lryta/HumbleTeacher\n\n**43. Interpolation-Based Semi-Supervised Learning for Object Detection**\n\n- 作者单位: 首尔大学, 阿尔托大学等\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Jeong_Interpolation-Based_Semi-Supervised_Learning_for_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/soo89/ISD-SSD\n\n# 域自适应目标检测\n\n**44. Domain-Specific Suppression for Adaptive Object Detection**\n\n- 作者单位: 中科院, 寒武纪, 国科大\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Domain-Specific_Suppression_for_Adaptive_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection**\n\n- 作者单位: 约翰斯·霍普金斯大学, 梅赛德斯—奔驰\n- Paper: https://arxiv.org/abs/2103.04224\n- Code: None\n\n**46. Unbiased Mean Teacher for Cross-Domain Object Detection**\n\n- 作者单位: 电子科技大学, ETH Zurich\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Deng_Unbiased_Mean_Teacher_for_Cross-Domain_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/kinredon/umt\n\n**47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors**\n\n- 作者单位: 香港大学, 厦门大学, Deepwise AI Lab\n- Paper: https://arxiv.org/abs/2103.13757\n- Code: None \n\n## 自监督目标检测\n\n**48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge**\n\n- 作者单位: 弗莱堡大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Valverde_There_Is_More_Than_Meets_the_Eye_Self-Supervised_Multi-Object_Detection_CVPR_2021_paper.html\n- Code: http://rl.uni-freiburg.de/research/multimodal-distill\n\n**49. Instance Localization for Self-supervised Detection Pretraining**\n\n- 作者单位: 香港中文大学, 微软亚洲研究院\n- Paper: https://arxiv.org/abs/2102.08318\n- Code: https://github.com/limbo0000/InstanceLoc\n\n## 弱监督目标检测\n\n**50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection**\n\n- 作者单位: 北航, 鹏城实验室, 商汤科技\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Informative_and_Consistent_Correspondence_Mining_for_Cross-Domain_Weakly_Supervised_Object_CVPR_2021_paper.html\n- Code: None\n\n**51. DAP: Detection-Aware Pre-training with Weak Supervision** \n\n- 作者单位: UIUC, 微软\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhong_DAP_Detection-Aware_Pre-Training_With_Weak_Supervision_CVPR_2021_paper.html\n- Code: None\n\n## 其他\n\n**52. Open-Vocabulary Object Detection Using Captions**\n\n- 作者单位：Snap, 哥伦比亚大学\n\n- Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html\n- Code: https://github.com/alirezazareian/ovr-cnn\n\n**53. Depth From Camera Motion and Object Detection**\n\n- 作者单位:  密歇根大学, SIAI\n\n- Paper: https://arxiv.org/abs/2103.01468\n- Code: https://github.com/griffbr/ODMD\n- Dataset: https://github.com/griffbr/ODMD\n\n**54. Unsupervised Object Detection With LIDAR Clues**\n\n- 作者单位: 商汤科技, 国科大, 中科大\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tian_Unsupervised_Object_Detection_With_LIDAR_Clues_CVPR_2021_paper.html\n- Code: None\n\n**55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs**\n\n- 作者单位: 国科大, 北理, 中科院, 商汤科技\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Bu_GAIA_A_Transfer_Learning_System_of_Object_Detection_That_Fits_CVPR_2021_paper.html\n- Code: https://github.com/GAIA-vision/GAIA-det\n\n**56. General Instance Distillation for Object Detection**\n\n- 作者单位: 旷视科技, 北航\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dai_General_Instance_Distillation_for_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**57. AQD: Towards Accurate Quantized Object Detection**\n\n- 作者单位: 蒙纳士大学, 阿德莱德大学, 华南理工大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_AQD_Towards_Accurate_Quantized_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/aim-uofa/model-quantization\n\n**58. Scale-Aware Automatic Augmentation for Object Detection**\n\n- 作者单位: 香港中文大学, 字节跳动AI Lab, 思谋科技\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scale-Aware_Automatic_Augmentation_for_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/Jia-Research-Lab/SA-AutoAug\n\n**59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection**\n\n- 作者单位: 同济大学, 商汤科技, 清华大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Tan_Equalization_Loss_v2_A_New_Gradient_Balance_Approach_for_Long-Tailed_CVPR_2021_paper.html\n- Code: https://github.com/tztztztztz/eqlv2\n\n**60. Class-Aware Robust Adversarial Training for Object Detection**\n\n- 作者单位: 哥伦比亚大学,  中央研究院 \n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Class-Aware_Robust_Adversarial_Training_for_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**61. Improved Handling of Motion Blur in Online Object Detection**\n\n- 作者单位: 伦敦大学学院\n- Homepage: http://visual.cs.ucl.ac.uk/pubs/handlingMotionBlur/\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sayed_Improved_Handling_of_Motion_Blur_in_Online_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**62. Multiple Instance Active Learning for Object Detection**\n\n- 作者单位: 国科大, 华为诺亚\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Yuan_Multiple_Instance_Active_Learning_for_Object_Detection_CVPR_2021_paper.html\n- Code: https://github.com/yuantn/MI-AOD\n\n**63. Neural Auto-Exposure for High-Dynamic Range Object Detection**\n\n- 作者单位: Algolux, 普林斯顿大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n**64. Generalizable Pedestrian Detection: The Elephant in the Room**\n\n- 作者单位: IIAI, 阿尔托大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hasan_Generalizable_Pedestrian_Detection_The_Elephant_in_the_Room_CVPR_2021_paper.html\n- Code: https://github.com/hasanirtiza/Pedestron\n\n**65. Neural Auto-Exposure for High-Dynamic Range Object Detection**\n\n- 作者单位: Algolux, 普林斯顿大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Onzon_Neural_Auto-Exposure_for_High-Dynamic_Range_Object_Detection_CVPR_2021_paper.html\n- Code: None\n\n<a name=\"Object-Tracking\"></a>\n\n# 单/多目标跟踪(Object Tracking)\n\n## 单目标跟踪\n\n**LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search**\n\n- Paper: https://arxiv.org/abs/2104.14545\n\n- Code: https://github.com/researchmm/LightTrack\n\n**Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark**\n\n- Homepage: https://sites.google.com/view/langtrackbenchmark/\n\n- Paper: https://arxiv.org/abs/2103.16746\n- Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit\n- Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang \n\n**IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking**\n\n- Paper: https://arxiv.org/abs/2103.14938\n- Code: https://github.com/VISION-SJTU/IoUattack\n\n**Graph Attention Tracking**\n\n- Paper: https://arxiv.org/abs/2011.11204\n- Code: https://github.com/ohhhyeahhh/SiamGAT\n\n**Rotation Equivariant Siamese Networks for Tracking**\n\n- Paper: https://arxiv.org/abs/2012.13078\n- Code: None\n\n**Track to Detect and Segment: An Online Multi-Object Tracker**\n\n- Homepage: https://jialianwu.com/projects/TraDeS.html\n- Paper: None\n- Code: None\n\n**Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking**\n\n- Paper(Oral): https://arxiv.org/abs/2103.11681\n\n- Code: https://github.com/594422814/TransformerTrack\n\n**Transformer Tracking**\n\n- Paper: https://arxiv.org/abs/2103.15436\n- Code: https://github.com/chenxin-dlut/TransT\n\n## 多目标跟踪\n\n**Tracking Pedestrian Heads in Dense Crowd**\n\n- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html\n- Code1: https://github.com/Sentient07/HeadHunter\n- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T\n- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/\n\n**Multiple Object Tracking with Correlation Learning**\n\n- Paper: https://arxiv.org/abs/2104.03541\n- Code: None\n\n**Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking**\n\n- Paper: https://arxiv.org/abs/2012.02337\n- Code: None\n\n**Learning a Proposal Classifier for Multiple Object Tracking**\n\n- Paper: https://arxiv.org/abs/2103.07889\n- Code: https://github.com/daip13/LPC_MOT.git\n\n**Track to Detect and Segment: An Online Multi-Object Tracker**\n\n- Homepage: https://jialianwu.com/projects/TraDeS.html\n- Paper: https://arxiv.org/abs/2103.08808\n- Code: https://github.com/JialianW/TraDeS\n\n<a name=\"Semantic-Segmentation\"></a>\n\n# 语义分割(Semantic Segmentation)\n\n**1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation**\n\n- 作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学\n\n- Homepage: https://nirkin.com/hyperseg/\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf\n\n- Code: https://github.com/YuvalNirkin/hyperseg\n\n**2. Rethinking BiSeNet For Real-time Semantic Segmentation**\n\n- 作者单位: 美团\n\n- Paper: https://arxiv.org/abs/2104.13188\n\n- Code: https://github.com/MichaelFan01/STDC-Seg\n\n**3. Progressive Semantic Segmentation**\n\n- 作者单位: VinAI Research, VinUniversity, 阿肯色大学, 石溪大学\n- Paper: https://arxiv.org/abs/2104.03778\n- Code: https://github.com/VinAIResearch/MagNet\n\n**4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers**\n\n- 作者单位: 复旦大学, 牛津大学, 萨里大学, 腾讯优图, Facebook AI\n- Homepage: https://fudan-zvg.github.io/SETR\n- Paper: https://arxiv.org/abs/2012.15840\n- Code: https://github.com/fudan-zvg/SETR\n\n**5. Capturing Omni-Range Context for Omnidirectional Segmentation**\n\n- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司, 华为\n- Paper: https://arxiv.org/abs/2103.05687\n- Code: None\n\n**6. Learning Statistical Texture for Semantic Segmentation**\n\n- 作者单位: 北航, 商汤科技\n- Paper: https://arxiv.org/abs/2103.04133\n- Code: None\n\n**7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation**\n\n- 作者单位: 高通AI研究院\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Borse_InverseForm_A_Loss_Function_for_Structured_Boundary-Aware_Segmentation_CVPR_2021_paper.html\n- Code: None\n\n**8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation**\n\n- 作者单位: Joyy Inc, 快手, 北航等\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_DCNAS_Densely_Connected_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.html\n- Code: None\n\n## 弱监督语义分割\n\n**9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation**\n\n- 作者单位: 延世大学, 成均馆大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lee_Railroad_Is_Not_a_Train_Saliency_As_Pseudo-Pixel_Supervision_for_CVPR_2021_paper.html\n- Code: https://github.com/halbielee/EPS\n\n**10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation**\n\n- 作者单位: 延世大学\n- Homepage:  https://cvlab.yonsei.ac.kr/projects/BANA/ \n- Paper: https://arxiv.org/abs/2104.00905\n- Code: None\n\n**11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation**\n\n- 作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学\n\n- Paper: https://arxiv.org/abs/2103.14581\n- Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom\n\n**12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation**\n\n- 作者单位: 北京理工大学, 美团\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_Embedded_Discriminative_Attention_Mechanism_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: https://github.com/allenwu97/EDAM\n\n**13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation**\n\n- 作者单位: 首尔大学\n- Paper: https://arxiv.org/abs/2103.08907\n- Code: https://github.com/jbeomlee93/BBAM\n\n## 半监督语义分割\n\n**14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision**\n\n- 作者单位: 北京大学, 微软亚洲研究院\n- Paper: https://arxiv.org/abs/2106.01226\n- Code: https://github.com/charlesCXK/TorchSemiSeg\n\n**15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation**\n\n- 作者单位: 华为, 大连理工大学, 北京大学\n- Paper: https://arxiv.org/abs/2103.04705\n- Code: None\n\n**16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency**\n\n- 作者单位: 香港中文大学, 思谋科技, 牛津大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Lai_Semi-Supervised_Semantic_Segmentation_With_Directional_Context-Aware_Consistency_CVPR_2021_paper.html\n- Code: None\n\n**17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization**\n\n- 作者单位: NVIDIA, 多伦多大学, 耶鲁大学, MIT, Vector Institute\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Semantic_Segmentation_With_Generative_Models_Semi-Supervised_Learning_and_Strong_Out-of-Domain_CVPR_2021_paper.html\n- Code: https://nv-tlabs.github.io/semanticGAN/\n\n**18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation**\n\n- 作者单位: ETH Zurich, 伯恩大学, 鲁汶大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hoyer_Three_Ways_To_Improve_Semantic_Segmentation_With_Self-Supervised_Depth_Estimation_CVPR_2021_paper.html\n- Code: https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth\n\n## 域自适应语义分割\n\n**19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation**\n\n- 作者单位: ETH Zurich, 鲁汶大学, 电子科技大学\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html\n- Code: None\n\n**20. Source-Free Domain Adaptation for Semantic Segmentation**\n\n- 作者单位: 华东师范大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Source-Free_Domain_Adaptation_for_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: None\n\n**21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation**\n\n- 作者单位: Idiap Research Institute, EPFL, 日内瓦大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/S_Uncertainty_Reduction_for_Model_Adaptation_in_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: https://git.io/JthPp\n\n**22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation**\n\n- 作者单位: 达姆施塔特工业大学, hessian.AI\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Araslanov_Self-Supervised_Augmentation_Consistency_for_Adapting_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: https://github.com/visinf/da-sac\n\n**23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening**\n\n- 作者单位: LG AI研究院, KAIST等\n- Paper: https://arxiv.org/abs/2103.15597\n- Code: https://github.com/shachoi/RobustNet\n\n**24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization**\n\n- 作者单位: 香港大学, 深睿医疗\n- Paper: https://arxiv.org/abs/2103.13041\n- Code: None\n\n**25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation**\n\n- 作者单位: 香港城市大学, 百度\n- Paper: https://arxiv.org/abs/2103.05254\n- Code: https://github.com/cyang-cityu/MetaCorrection\n\n**26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation**\n\n- 作者单位: 华为云, 华为诺亚, 大连理工大学\n- Paper: https://arxiv.org/abs/2103.04717\n- Code: None\n\n**27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation**\n\n- 作者单位: 中国科学技术大学, 微软亚洲研究院\n- Paper: https://arxiv.org/abs/2101.10979\n- Code: https://github.com/microsoft/ProDA\n\n**28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation**\n\n- 作者单位: 南卡罗来纳大学, 天远视科技\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wu_DANNet_A_One-Stage_Domain_Adaptation_Network_for_Unsupervised_Nighttime_Semantic_CVPR_2021_paper.html\n- Code: https://github.com/W-zx-Y/DANNet\n\n## Few-Shot语义分割\n\n**29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation**\n\n- 作者单位: MBZUAI, IIAI, 哈工大\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Xie_Scale-Aware_Graph_Neural_Network_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: None\n\n**30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation**\n\n- 作者单位: 国科大, 清华大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Anti-Aliasing_Semantic_Reconstruction_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: https://github.com/Bibkiller/ASR \n\n## 无监督语义分割\n\n**31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering**\n\n- 作者单位: UT-Austin, 康奈尔大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cho_PiCIE_Unsupervised_Semantic_Segmentation_Using_Invariance_and_Equivariance_in_Clustering_CVPR_2021_paper.html\n- Code: https:// github.com/janghyuncho/PiCIE\n\n## 视频语义分割\n\n**32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild**\n\n- 作者单位: 浙江大学, 百度, 悉尼科技大学\n- Homepage: https://www.vspwdataset.com/\n- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf\n- GitHub: https://github.com/sssdddwww2/vspw_dataset_download\n\n## 其它\n\n**33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations**\n\n- 作者单位: 帕多瓦大学\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html\n- Code: https://lttm.dei.unipd.it/paper_data/SDR/\n\n**34. Exploit Visual Dependency Relations for Semantic Segmentation**\n\n- 作者单位: 伊利诺伊大学芝加哥分校\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_Exploit_Visual_Dependency_Relations_for_Semantic_Segmentation_CVPR_2021_paper.html\n- Code: None\n\n**35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs**\n\n- 作者单位: Institute for Infocomm Research, 新加坡国立大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.html\n- Code: None\n\n**36. PLOP: Learning without Forgetting for Continual Semantic Segmentation**\n\n- 作者单位: 索邦大学, Heuritech, Datakalab, Valeo.ai \n- Paper: https://arxiv.org/abs/2011.11390\n- Code: https://github.com/arthurdouillard/CVPR2021_PLOP\n\n**37. 3D-to-2D Distillation for Indoor Scene Parsing**\n\n- 作者单位: 香港中文大学, 香港大学\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Liu_3D-to-2D_Distillation_for_Indoor_Scene_Parsing_CVPR_2021_paper.html\n- Code: None\n\n**38. Bidirectional Projection Network for Cross Dimension Scene Understanding**\n\n- 作者单位: 香港中文大学, 牛津大学等\n- Paper(Oral): https://arxiv.org/abs/2103.14326\n- Code: https://github.com/wbhu/BPNet\n\n**39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation**\n\n- 作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html\n- Code: https://github.com/lxtGH/PFSegNets\n\n<a name=\"Instance-Segmentation\"></a>\n\n# 实例分割(Instance Segmentation)\n\n**DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2011.09876\n- Code: https://github.com/aliyun/DCT-Mask\n\n**Incremental Few-Shot Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2105.05312\n- Code: https://github.com/danganea/iMTFA\n\n**A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2105.03186\n- Code: None\n\n**RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features**\n\n- Paper: https://arxiv.org/abs/2104.08569\n- Code: https://github.com/zhanggang001/RefineMask/\n\n**Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2104.05239\n- Code:  https://github.com/tinyalpha/BPR \n\n**Multi-Scale Aligned Distillation for Low-Resolution Detection**\n\n- Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf\n\n- Code: https://github.com/Jia-Research-Lab/MSAD\n\n**Boundary IoU: Improving Object-Centric Image Segmentation Evaluation**\n\n- Homepage: https://bowenc0221.github.io/boundary-iou/\n- Paper: https://arxiv.org/abs/2103.16562\n\n- Code: https://github.com/bowenc0221/boundary-iou-api\n\n**Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers**\n\n- Paper: https://arxiv.org/abs/2103.12340\n\n- Code: https://github.com/lkeab/BCNet \n\n**Zero-shot instance segmentation（Not Sure）**\n\n- Paper: None\n- Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395\n\n## 视频实例分割\n\n**STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation**\n\n- Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm\n- Code: https://github.com/MinghanLi/STMask\n\n**End-to-End Video Instance Segmentation with Transformers**\n\n- Paper(Oral): https://arxiv.org/abs/2011.14503\n- Code: https://github.com/Epiphqny/VisTR\n\n<a name=\"Panoptic-Segmentation\"></a>\n\n# 全景分割(Panoptic Segmentation)\n\n**ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2012.05258\n- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab\n- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab\n\n**Part-aware Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2106.06351\n- Code: https://github.com/tue-mps/panoptic_parts\n- Dataset: https://github.com/tue-mps/panoptic_parts\n\n**Exemplar-Based Open-Set Panoptic Segmentation Network**\n\n- Homepage: https://cv.snu.ac.kr/research/EOPSN/\n- Paper: https://arxiv.org/abs/2105.08336\n- Code: https://github.com/jd730/EOPSN\n\n**MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html\n- Code: None\n\n**Panoptic Segmentation Forecasting**\n\n- Paper: https://arxiv.org/abs/2104.03962\n- Code: https://github.com/nianticlabs/panoptic-forecasting\n\n**Fully Convolutional Networks for Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2012.00720\n\n- Code: https://github.com/yanwei-li/PanopticFCN\n\n**Cross-View Regularization for Domain Adaptive Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2103.02584\n- Code: None\n\n<a name=\"Medical-Image-Segmentation\"></a>\n\n# 医学图像分割\n\n**1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling**\n\n- 作者单位: 腾讯天衍实验室, 北京同仁医院\n- Paper(Best Paper Candidate): https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Learning_Calibrated_Medical_Image_Segmentation_via_Multi-Rater_Agreement_Modeling_CVPR_2021_paper.html\n- Code: https://github.com/jiwei0921/MRNet/\n\n**2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation**\n\n- 作者单位: 卡尔斯鲁厄理工学院, 卡尔·蔡司等\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Reiss_Every_Annotation_Counts_Multi-Label_Deep_Supervision_for_Medical_Image_Segmentation_CVPR_2021_paper.html\n- Code: None\n\n**3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space**\n\n- 作者单位: 香港中文大学, 香港理工大学\n- Paper: https://arxiv.org/abs/2103.06030\n- Code: https://github.com/liuquande/FedDG-ELCFS\n\n**4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation**\n\n- 作者单位: 约翰斯·霍普金斯大大学, NVIDIA\n- Paper(Oral): https://arxiv.org/abs/2103.15954\n- Code: None\n\n**5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images**\n\n- 作者单位: 斯坦福大学\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html\n- Code: None\n\n<a name=\"VOS\"></a>\n\n# 视频目标分割(Video-Object-Segmentation)\n\n**Learning Position and Target Consistency for Memory-based Video Object Segmentation**\n\n- Paper: https://arxiv.org/abs/2104.04329\n- Code: None\n\n**SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation**\n\n- Paper(Oral): https://arxiv.org/abs/2101.08833\n- Code: https://github.com/dukebw/SSTVOS\n\n<a name=\"IVOS\"></a>\n\n# 交互式视频目标分割(Interactive-Video-Object-Segmentation)\n\n**Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion**\n\n- Homepage: https://hkchengrex.github.io/MiVOS/\n\n- Paper: https://arxiv.org/abs/2103.07941\n\n- Code: https://github.com/hkchengrex/MiVOS\n- Demo: https://hkchengrex.github.io/MiVOS/video.html#partb\n\n**Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild**\n\n- Paper: https://arxiv.org/abs/2103.10391\n\n- Code: https://github.com/svip-lab/IVOS-W\n\n<a name=\"Saliency-Detection\"></a>\n\n# 显著性检测(Saliency Detection)\n\n**Uncertainty-aware Joint Salient Object and Camouflaged Object Detection**\n\n- Paper: https://arxiv.org/abs/2104.02628\n\n- Code: https://github.com/JingZhang617/Joint_COD_SOD\n\n**Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion**\n\n- Paper(Oral): https://arxiv.org/abs/2103.11832\n- Code: https://github.com/sunpeng1996/DSA2F\n\n<a name=\"Camouflaged-Object-Detection\"></a>\n\n# 伪装物体检测(Camouflaged Object Detection)\n\n**Uncertainty-aware Joint Salient Object and Camouflaged Object Detection**\n\n- Paper: https://arxiv.org/abs/2104.02628\n\n- Code: https://github.com/JingZhang617/Joint_COD_SOD\n\n<a name=\"CoSOD\"></a>\n\n# 协同显著性检测(Co-Salient Object Detection)\n\n**Group Collaborative Learning for Co-Salient Object Detection**\n\n- Paper: https://arxiv.org/abs/2104.01108\n- Code: https://github.com/fanq15/GCoNet\n\n<a name=\"Matting\"></a>\n\n# 协同显著性检测(Image Matting)\n\n**Semantic Image Matting**\n\n- Paper: https://arxiv.org/abs/2104.08201\n- Code: https://github.com/nowsyn/SIM\n- Dataset: https://github.com/nowsyn/SIM\n\n<a name=\"Re-ID\"></a>\n\n# 行人重识别(Person Re-identification)\n\n**Generalizable Person Re-identification with Relevance-aware Mixture of Experts**\n\n- Paper: https://arxiv.org/abs/2105.09156\n- Code: None\n\n**Unsupervised Multi-Source Domain Adaptation for Person Re-Identification**\n\n- Paper: https://arxiv.org/abs/2104.12961\n- Code: None\n\n**Combined Depth Space based Architecture Search For Person Re-identification**\n\n- Paper: https://arxiv.org/abs/2104.04163\n- Code: None\n\n<a name=\"Person-Search\"></a>\n\n# 行人搜索(Person Search)\n\n**Anchor-Free Person Search**\n\n- Paper: https://arxiv.org/abs/2103.11617\n- Code: https://github.com/daodaofr/AlignPS\n- Interpretation: [首个无需锚框（Anchor-Free）的行人搜索框架 | CVPR 2021](https://mp.weixin.qq.com/s/iqJkgp0JBanmeBPyHUkb-A)\n\n<a name=\"Video-Understanding\"></a>\n\n# 视频理解/行为识别(Video Understanding)\n\n**Temporal-Relational CrossTransformers for Few-Shot Action Recognition**\n\n- Paper: https://arxiv.org/abs/2101.06184\n- Code: https://github.com/tobyperrett/trx\n\n**FrameExit: Conditional Early Exiting for Efficient Video Recognition**\n\n- Paper(Oral): https://arxiv.org/abs/2104.13400\n- Code: None\n\n**No frame left behind: Full Video Action Recognition**\n\n- Paper: https://arxiv.org/abs/2103.15395\n- Code: None\n\n**Learning Salient Boundary Feature for Anchor-free Temporal Action Localization**\n\n- Paper: https://arxiv.org/abs/2103.13137\n- Code: None\n\n**Temporal Context Aggregation Network for Temporal Action Proposal Refinement**\n\n- Paper: https://arxiv.org/abs/2103.13141\n- Code: None\n- Interpretation: [CVPR 2021 | TCANet：最强时序动作提名修正网络](https://mp.weixin.qq.com/s/UOWMfpTljkyZznHtpkQBhA)\n\n**ACTION-Net: Multipath Excitation for Action Recognition**\n\n- Paper: https://arxiv.org/abs/2103.07372\n- Code: https://github.com/V-Sense/ACTION-Net\n\n**Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning**\n\n- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html\n- Paper: https://arxiv.org/abs/2009.05769\n- Code: https://github.com/FingerRec/BE\n\n**TDN: Temporal Difference Networks for Efficient Action Recognition**\n\n- Paper: https://arxiv.org/abs/2012.10071\n- Code: https://github.com/MCG-NJU/TDN\n\n<a name=\"Face-Recognition\"></a>\n\n# 人脸识别(Face Recognition)\n\n**A 3D GAN for Improved Large-pose Facial Recognition**\n\n- Paper: https://arxiv.org/abs/2012.10545\n- Code: None\n\n**MagFace: A Universal Representation for Face Recognition and Quality Assessment**\n\n- Paper(Oral): https://arxiv.org/abs/2103.06627\n- Code: https://github.com/IrvingMeng/MagFace\n\n**WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition**\n\n- Homepage: https://www.face-benchmark.org/\n- Paper: https://arxiv.org/abs/2103.04098 \n- Dataset: https://www.face-benchmark.org/\n\n**When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework**\n\n- Paper(Oral): https://arxiv.org/abs/2103.01520\n- Code: https://github.com/Hzzone/MTLFace\n- Dataset: https://github.com/Hzzone/MTLFace\n\n<a name=\"Face-Detection\"></a>\n\n# 人脸检测(Face Detection)\n\n**HLA-Face: Joint High-Low Adaptation for Low Light Face Detection**\n\n- Homepage: https://daooshee.github.io/HLA-Face-Website/\n- Paper: https://arxiv.org/abs/2104.01984\n- Code: https://github.com/daooshee/HLA-Face-Code\n\n**CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement**\n\n- Paper: https://arxiv.org/abs/2103.07017\n- Code: None\n\n<a name=\"Face-Anti-Spoofing\"></a>\n\n# 人脸活体检测(Face Anti-Spoofing)\n\n**Cross Modal Focal Loss for RGBD Face Anti-Spoofing**\n\n- Paper: https://arxiv.org/abs/2103.00948\n- Code: None\n\n<a name=\"Deepfake-Detection\"></a>\n\n# Deepfake检测(Deepfake Detection)\n\n**Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain**\n\n- Paper：https://arxiv.org/abs/2103.01856\n- Code: None\n\n**Multi-attentional Deepfake Detection**\n\n- Paper：https://arxiv.org/abs/2103.02406\n- Code: None\n\n<a name=\"Age-Estimation\"></a>\n\n# 人脸年龄估计(Age Estimation)\n\n**Continuous Face Aging via Self-estimated Residual Age Embedding**\n\n- Paper: https://arxiv.org/abs/2105.00020\n- Code: None\n\n**PML: Progressive Margin Loss for Long-tailed Age Classification**\n\n- Paper: https://arxiv.org/abs/2103.02140\n- Code: None\n\n<a name=\"FER\"></a>\n\n# 人脸表情识别(Facial Expression Recognition)\n\n**Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition**\n\n- Paper: https://arxiv.org/abs/2103.13372\n- Code: None\n\n<a name=\"Deepfakes\"></a>\n\n# Deepfakes\n\n**MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes**\n\n- Paper: https://arxiv.org/abs/2103.14211\n- Code: None\n\n<a name=\"Human-Parsing\"></a>\n\n# 人体解析(Human Parsing)\n\n**Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing**\n\n- Paper: https://arxiv.org/abs/2103.04570\n- Code: https://github.com/tfzhou/MG-HumanParsing\n\n<a name=\"Human-Pose-Estimation\"></a>\n\n# 2D/3D人体姿态估计(2D/3D Human Pose Estimation)\n\n## 2D 人体姿态估计\n\n**ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search**\n\n- Paper: ttps://arxiv.org/abs/2105.10154\n- Code: None\n\n**When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks**\n\n- Paper: https://arxiv.org/abs/2105.06152\n- Code: None\n\n**Pose Recognition with Cascade Transformers**\n\n- Paper: https://arxiv.org/abs/2104.06976\n\n- Code: https://github.com/mlpc-ucsd/PRTR\n\n**DCPose: Deep Dual Consecutive Network for Human Pose Estimation**\n\n-  Paper: https://arxiv.org/abs/2103.07254\n- Code: https://github.com/Pose-Group/DCPose \n\n## 3D 人体姿态估计\n\n**End-to-End Human Pose and Mesh Reconstruction with Transformers**\n\n- Paper: https://arxiv.org/abs/2012.09760\n- Code: https://github.com/microsoft/MeshTransformer\n\n**PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation**\n\n- Paper(Oral): https://arxiv.org/abs/2105.02465\n\n- Code: https://github.com/jfzhang95/PoseAug\n\n**Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration**\n\n- Paper: https://arxiv.org/abs/2103.02845\n- Code: https://github.com/SeanChenxy/HandMesh\n\n**Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks**\n\n- Paper: https://arxiv.org/abs/2104.01797\n- https://github.com/3dpose/3D-Multi-Person-Pose\n\n**HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation**\n\n- Homepage: https://jeffli.site/HybrIK/ \n- Paper: https://arxiv.org/abs/2011.14672\n- Code: https://github.com/Jeff-sjtu/HybrIK\n\n<a name=\"Animal-Pose-Estimation\"></a>\n\n# 动物姿态估计(Animal Pose Estimation)\n\n**From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2103.14843\n- Code: None\n\n<a name=\"Hand-Pose-Estimation\"></a>\n\n# 手部姿态估计(Hand Pose Estimation)\n\n**Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time**\n\n- Homepage: https://stevenlsw.github.io/Semi-Hand-Object/\n- Paper: https://arxiv.org/abs/2106.05266\n- Code: https://github.com/stevenlsw/Semi-Hand-Object\n\n<a name=\"Human-Volumetric-Capture\"></a>\n\n# Human Volumetric Capture\n\n**POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture**\n\n- Homepage: http://www.liuyebin.com/posefusion/posefusion.html\n\n- Paper(Oral): https://arxiv.org/abs/2103.15331\n- Code: None\n\n<a name=\"Scene-Text-Recognition\"></a>\n\n# 场景文本检测(Scene Text Detection)\n\n**Fourier Contour Embedding for Arbitrary-Shaped Text Detection**\n\n- Paper: https://arxiv.org/abs/2104.10442\n- Code: None\n\n<a name=\"Scene-Text-Recognition\"></a>\n\n# 场景文本识别(Scene Text Recognition)\n\n**Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition**\n\n- Paper: https://arxiv.org/abs/2103.06495\n- Code: https://github.com/FangShancheng/ABINet\n\n<a name=\"Image-Compression\"></a>\n\n# 图像压缩\n\n**Checkerboard Context Model for Efficient Learned Image Compression**\n\n- Paper: https://arxiv.org/abs/2103.15306\n- Code: None\n\n**Slimmable Compressive Autoencoders for Practical Neural Image Compression**\n\n- Paper: https://arxiv.org/abs/2103.15726\n- Code: None\n\n**Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton**\n\n- Paper: https://arxiv.org/abs/2103.15368\n- Code: None\n\n<a name=\"Model-Compression\"></a>\n\n# 模型压缩/剪枝/量化\n\n**Teachers Do More Than Teach: Compressing Image-to-Image Models**\n\n- Paper: https://arxiv.org/abs/2103.03467\n- Code: https://github.com/snap-research/CAT\n\n## 模型剪枝\n\n**Dynamic Slimmable Network**\n\n- Paper: https://arxiv.org/abs/2103.13258\n- Code: https://github.com/changlin31/DS-Net\n\n## 模型量化\n\n**Network Quantization with Element-wise Gradient Scaling**\n\n- Paper: https://arxiv.org/abs/2104.00903\n- Code: None\n\n**Zero-shot Adversarial Quantization**\n\n- Paper(Oral): https://arxiv.org/abs/2103.15263\n- Code: https://git.io/Jqc0y\n\n**Learnable Companding Quantization for Accurate Low-bit Neural Networks**\n\n- Paper: https://arxiv.org/abs/2103.07156\n- Code: None\n\n<a name=\"KD\"></a>\n\n# 知识蒸馏(Knowledge Distillation)\n\n**Distilling Knowledge via Knowledge Review**\n\n- Paper: https://arxiv.org/abs/2104.09044\n- Code: https://github.com/Jia-Research-Lab/ReviewKD\n\n**Distilling Object Detectors via Decoupled Features**\n\n- Paper: https://arxiv.org/abs/2103.14475\n- Code: https://github.com/ggjy/DeFeat.pytorch\n\n<a name=\"Super-Resolution\"></a>\n\n# 超分辨率(Super-Resolution)\n\n**Image Super-Resolution with Non-Local Sparse Attention**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Mei_Image_Super-Resolution_With_Non-Local_Sparse_Attention_CVPR_2021_paper.pdf\n- Code: https://github.com/HarukiYqM/Non-Local-Sparse-Attention\n\n**Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline**\n\n- Homepage: http://mepro.bjtu.edu.cn/resource.html\n- Paper: https://arxiv.org/abs/2104.06174\n- Code: None\n\n**ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic**\n\n- Paper: https://arxiv.org/abs/2103.04039\n- Code: https://github.com/Xiangtaokong/ClassSR\n\n**AdderSR: Towards Energy Efficient Image Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2009.08891\n- Code: None\n\n<a name=\"Dehazing\"></a>\n\n# 去雾(Dehazing)\n\n**Contrastive Learning for Compact Single Image Dehazing**\n\n- Paper: https://arxiv.org/abs/2104.09367\n- Code: https://github.com/GlassyWu/AECR-Net\n\n## 视频超分辨率\n\n**Temporal Modulation Network for Controllable Space-Time Video Super-Resolution**\n\n- Paper: None\n- Code: https://github.com/CS-GangXu/TMNet\n\n<a name=\"Image-Restoration\"></a>\n\n# 图像恢复(Image Restoration)\n\n**Multi-Stage Progressive Image Restoration**\n\n- Paper: https://arxiv.org/abs/2102.02808\n- Code: https://github.com/swz30/MPRNet\n\n<a name=\"Image-Inpainting\"></a>\n\n# 图像补全(Image Inpainting)\n\n**PD-GAN: Probabilistic Diverse GAN for Image Inpainting**\n\n- Paper: https://arxiv.org/abs/2105.02201\n- Code: https://github.com/KumapowerLIU/PD-GAN\n\n**TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations**\n\n- Homepage: https://yzhouas.github.io/projects/TransFill/index.html\n- Paper: https://arxiv.org/abs/2103.15982\n- Code: None\n\n<a name=\"Image-Editing\"></a>\n\n# 图像编辑(Image Editing)\n\n**StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**\n\n- Paper: https://arxiv.org/abs/2104.14754\n- Code: https://github.com/naver-ai/StyleMapGAN\n- Demo Video: https://youtu.be/qCapNyRA_Ng\n\n**High-Fidelity and Arbitrary Face Editing**\n\n- Paper: https://arxiv.org/abs/2103.15814\n- Code: None\n\n**Anycost GANs for Interactive Image Synthesis and Editing**\n\n- Paper: https://arxiv.org/abs/2103.03243\n- Code: https://github.com/mit-han-lab/anycost-gan\n\n**PISE: Person Image Synthesis and Editing with Decoupled GAN**\n\n- Paper: https://arxiv.org/abs/2103.04023\n- Code: https://github.com/Zhangjinso/PISE\n\n**DeFLOCNet: Deep Image Editing via Flexible Low-level Controls**\n\n- Paper: http://raywzy.com/\n- Code: http://raywzy.com/\n\n**Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing**\n\n- Paper: None\n- Code: None\n\n<a name=\"Image-Captioning\"></a>\n\n# 图像描述(Image Captioning)\n\n**Towards Accurate Text-based Image Captioning with Content Diversity Exploration**\n\n- Paper: https://arxiv.org/abs/2105.03236\n- Code: None\n\n<a name=\"Font-Generation\"></a>\n\n# 字体生成(Font Generation)\n\n**DG-Font: Deformable Generative Networks for Unsupervised Font Generation**\n\n- Paper: https://arxiv.org/abs/2104.03064\n\n- Code: https://github.com/ecnuycxie/DG-Font\n\n<a name=\"Image-Matching\"></a>\n\n# 图像匹配(Image Matcing)\n\n**LoFTR: Detector-Free Local Feature Matching with Transformers**\n\n- Homepage: https://zju3dv.github.io/loftr/\n- Paper: https://arxiv.org/abs/2104.00680\n- Code: https://github.com/zju3dv/LoFTR\n\n**Convolutional Hough Matching Networks**\n\n- Homapage: http://cvlab.postech.ac.kr/research/CHM/\n- Paper(Oral): https://arxiv.org/abs/2103.16831\n- Code: None\n\n<a name=\"Image-Blending\"></a>\n\n# 图像融合(Image Blending)\n\n**Bridging the Visual Gap: Wide-Range Image Blending**\n\n- Paper: https://arxiv.org/abs/2103.15149\n\n- Code: https://github.com/julia0607/Wide-Range-Image-Blending\n\n<a name=\"Reflection-Removal\"></a>\n\n# 反光去除(Reflection Removal)\n\n**Robust Reflection Removal with Reflection-free Flash-only Cues**\n\n- Paper: https://arxiv.org/abs/2103.04273\n- Code: https://github.com/ChenyangLEI/flash-reflection-removal\n\n<a name=\"3D-C\"></a>\n\n# 3D点云分类(3D Point Clouds Classification)\n\n**Equivariant Point Network for 3D Point Cloud Analysis**\n\n- Paper: https://arxiv.org/abs/2103.14147\n- Code: None\n\n**PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds**\n\n- Paper: https://arxiv.org/abs/2103.14635\n- Code: https://github.com/CVMI-Lab/PAConv\n\n<a name=\"3D-Object-Detection\"></a>\n\n# 3D目标检测(3D Object Detection)\n\n**3D-MAN: 3D Multi-frame Attention Network for Object Detection**\n\n- Paper: https://arxiv.org/abs/2103.16054\n- Code: None\n\n**Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds**\n\n- Paper: https://arxiv.org/abs/2104.06114\n- Code: https://github.com/cheng052/BRNet\n\n**HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection**\n\n- Homepage:  https://cvlab.yonsei.ac.kr/projects/HVPR/ \n\n- Paper: https://arxiv.org/abs/2104.00902\n- Code:  https://github.com/cvlab-yonsei/HVPR \n\n**LiDAR R-CNN: An Efficient and Universal 3D Object Detector**\n\n- Paper: https://arxiv.org/abs/2103.15297\n- Code: https://github.com/tusimple/LiDAR_RCNN\n\n**M3DSSD: Monocular 3D Single Stage Object Detector**\n\n- Paper: https://arxiv.org/abs/2103.13164\n\n- Code: https://github.com/mumianyuxin/M3DSSD\n\n**SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud**\n\n- Paper: None\n- Code: https://github.com/Vegeta2020/SE-SSD\n\n**Center-based 3D Object Detection and Tracking**\n\n- Paper: https://arxiv.org/abs/2006.11275\n- Code: https://github.com/tianweiy/CenterPoint\n\n**Categorical Depth Distribution Network for Monocular 3D Object Detection**\n\n- Paper: https://arxiv.org/abs/2103.01100\n- Code: None\n\n<a name=\"3D-Semantic-Segmentation\"></a>\n\n# 3D语义分割(3D Semantic Segmentation)\n\n**Bidirectional Projection Network for Cross Dimension Scene Understanding**\n\n- Paper(Oral): https://arxiv.org/abs/2103.14326\n- Code: https://github.com/wbhu/BPNet\n\n**Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion**\n\n- Paper: https://arxiv.org/abs/2103.07074\n- Code: https://github.com/ShiQiu0419/BAAF-Net\n\n**Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation**\n\n- Paper: https://arxiv.org/abs/2011.10033\n- Code:  https://github.com/xinge008/Cylinder3D \n\n **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges**\n\n- Homepage: https://github.com/QingyongHu/SensatUrban\n- Paper: http://arxiv.org/abs/2009.03137\n- Code: https://github.com/QingyongHu/SensatUrban\n- Dataset: https://github.com/QingyongHu/SensatUrban\n\n<a name=\"3D-Panoptic-Segmentation\"></a>\n\n# 3D全景分割(3D Panoptic Segmentation)\n\n**Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2103.14962\n- Code: https://github.com/edwardzhou130/Panoptic-PolarNet\n\n<a name=\"3D-Object-Tracking\"></a>\n\n# 3D目标跟踪(3D Object Trancking)\n\n**Center-based 3D Object Detection and Tracking**\n\n- Paper: https://arxiv.org/abs/2006.11275\n- Code: https://github.com/tianweiy/CenterPoint\n\n<a name=\"3D-PointCloud-Registration\"></a>\n\n# 3D点云配准(3D Point Cloud Registration)\n\n**ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning**\n\n- Paper: https://arxiv.org/abs/2103.15231\n- Code: None\n\n**PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency**\n\n- Paper: https://arxiv.org/abs/2103.05465\n- Code: https://github.com/XuyangBai/PointDSC \n\n**PREDATOR: Registration of 3D Point Clouds with Low Overlap**\n\n- Paper: https://arxiv.org/abs/2011.13005\n- Code: https://github.com/ShengyuH/OverlapPredator\n\n<a name=\"3D-Point-Cloud-Completion\"></a>\n\n# 3D点云补全(3D Point Cloud Completion)\n\n**Unsupervised 3D Shape Completion through GAN Inversion**\n\n- Homepage: https://junzhezhang.github.io/projects/ShapeInversion/\n- Paper: https://arxiv.org/abs/2104.13366 \n- Code: https://github.com/junzhezhang/shape-inversion \n\n**Variational Relational Point Completion Network**\n\n- Homepage:  https://paul007pl.github.io/projects/VRCNet \n- Paper: https://arxiv.org/abs/2104.10154\n- Code: https://github.com/paul007pl/VRCNet\n\n**Style-based Point Generator with Adversarial Rendering for Point Cloud Completion**\n\n- Homepage: https://alphapav.github.io/SpareNet/\n\n- Paper: https://arxiv.org/abs/2103.02535\n- Code: https://github.com/microsoft/SpareNet\n\n<a name=\"3D-Reconstruction\"></a>\n\n# 3D重建(3D Reconstruction)\n\n**Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection**\n\n- Paper: http://arxiv.org/abs/2106.07852\n- Code: https://github.com/TencentYoutuResearch/3DFaceReconstruction-LAP\n\n**Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction**\n\n- Paper: https://arxiv.org/abs/2104.00858\n- Code: None\n\n**NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video**\n\n- Homepage: https://zju3dv.github.io/neuralrecon/\n\n- Paper(Oral): https://arxiv.org/abs/2104.00681\n- Code: https://github.com/zju3dv/NeuralRecon\n\n<a name=\"6D-Pose-Estimation\"></a>\n\n# 6D位姿估计(6D Pose Estimation)\n\n**FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism**\n\n- Paper(Oral): https://arxiv.org/abs/2103.07054\n- Code: https://github.com/DC1991/FS-Net\n\n**GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation**\n\n- Paper: http://arxiv.org/abs/2102.12145\n- code: https://git.io/GDR-Net\n\n**FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2103.02242\n- Code: https://github.com/ethnhe/FFB6D\n\n<a name=\"Camera-Pose-Estimation\"></a>\n\n# 相机姿态估计\n\n**Back to the Feature: Learning Robust Camera Localization from Pixels to Pose**\n\n- Paper: https://arxiv.org/abs/2103.09213\n- Code: https://github.com/cvg/pixloc\n\n<a name=\"Depth-Estimation\"></a>\n\n# 深度估计(Depth Estimation)\n\n**S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation**\n\n- Paper(Oral): https://arxiv.org/abs/2104.00877\n- Code: None\n\n**Beyond Image to Depth: Improving Depth Prediction using Echoes**\n\n- Homepage: https://krantiparida.github.io/projects/bimgdepth.html\n- Paper: https://arxiv.org/abs/2103.08468\n- Code: https://github.com/krantiparida/beyond-image-to-depth\n\n**S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation**\n\n- Paper: https://arxiv.org/abs/2103.02396\n- Code: None\n\n**Depth from Camera Motion and Object Detection**\n\n- Paper: https://arxiv.org/abs/2103.01468\n- Code: https://github.com/griffbr/ODMD\n- Dataset: https://github.com/griffbr/ODMD\n\n<a name=\"Stereo-Matching\"></a>\n\n# 立体匹配(Stereo Matching)\n\n**A Decomposition Model for Stereo Matching**\n\n- Paper: https://arxiv.org/abs/2104.07516\n- Code: None\n\n<a name=\"Flow-Estimation\"></a>\n\n# 光流估计(Flow Estimation)\n\n**Self-Supervised Multi-Frame Monocular Scene Flow**\n\n- Paper: https://arxiv.org/abs/2105.02216\n- Code: https://github.com/visinf/multi-mono-sf\n\n**RAFT-3D: Scene Flow using Rigid-Motion Embeddings**\n\n- Paper: https://arxiv.org/abs/2012.00726v1\n- Code: None\n\n**Learning Optical Flow From Still Images**\n\n- Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/\n\n- Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf\n- Code: https://github.com/mattpoggi/depthstillation\n\n**FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds**\n\n- Paper: https://arxiv.org/abs/2104.00798\n- Code: None\n\n<a name=\"Lane-Detection\"></a>\n\n# 车道线检测(Lane Detection)\n\n**Focus on Local: Detecting Lane Marker from Bottom Up via Key Point**\n\n- Paper: https://arxiv.org/abs/2105.13680\n- Code: None\n\n**Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection**\n\n- Paper: https://arxiv.org/abs/2010.12035\n- Code: https://github.com/lucastabelini/LaneATT \n\n<a name=\"Trajectory-Prediction\"></a>\n\n# 轨迹预测(Trajectory Prediction)\n\n**Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction**\n\n- Paper(Oral): https://arxiv.org/abs/2104.08277\n- Code: None\n\n<a name=\"Crowd-Counting\"></a>\n\n# 人群计数(Crowd Counting)\n\n**Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark**\n\n- Paper: https://arxiv.org/abs/2105.02440\n\n- Code: https://github.com/VisDrone/DroneCrowd\n\n- Dataset: https://github.com/VisDrone/DroneCrowd\n\n<a name=\"AE\"></a>\n\n# 对抗样本(Adversarial Examples)\n\n**Enhancing the Transferability of Adversarial Attacks through Variance Tuning**\n\n- Paper: https://arxiv.org/abs/2103.15571\n- Code: https://github.com/JHL-HUST/VT\n\n**LiBRe: A Practical Bayesian Approach to Adversarial Detection**\n\n- Paper: https://arxiv.org/abs/2103.14835\n- Code: None\n\n**Natural Adversarial Examples**\n\n- Paper: https://arxiv.org/abs/1907.07174\n- Code: https://github.com/hendrycks/natural-adv-examples\n\n<a name=\"Image-Retrieval\"></a>\n\n# 图像检索(Image Retrieval)\n\n**StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval**\n\n- Paper: https://arxiv.org/abs/2103.15706\n- COde: None\n\n**QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval**\n\n- Paper: https://arxiv.org/abs/2103.02927\n- Code: None\n\n<a name=\"Video-Retrieval\"></a>\n\n# 视频检索(Video Retrieval)\n\n**On Semantic Similarity in Video Retrieval**\n\n- Paper: https://arxiv.org/abs/2103.10095\n\n- Homepage: https://mwray.github.io/SSVR/\n- Code: https://github.com/mwray/Semantic-Video-Retrieval\n\n<a name=\"Cross-modal-Retrieval\"></a>\n\n# 跨模态检索(Cross-modal Retrieval)\n\n**Cross-Modal Center Loss for 3D Cross-Modal Retrieval**\n\n- Paper: https://arxiv.org/abs/2008.03561\n- Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss \n\n**Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers**\n\n- Paper: https://arxiv.org/abs/2103.16553\n- Code: None\n\n**Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning**\n\n- Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning\n\n- Code: https://github.com/amzn/image-to-recipe-transformers\n\n<a name=\"Zero-Shot-Learning\"></a>\n\n#  Zero-Shot Learning\n\n**Counterfactual Zero-Shot and Open-Set Visual Recognition**\n\n- Paper: https://arxiv.org/abs/2103.00887\n- Code: https://github.com/yue-zhongqi/gcm-cf\n\n<a name=\"Federated-Learning\"></a>\n\n# 联邦学习(Federated Learning)\n\n**FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space**\n\n- Paper: https://arxiv.org/abs/2103.06030\n- Code: https://github.com/liuquande/FedDG-ELCFS\n\n<a name=\"Video-Frame-Interpolation\"></a>\n\n# 视频插帧(Video Frame Interpolation)\n\n**CDFI: Compression-Driven Network Design for Frame Interpolation**\n\n- Paper: None\n- Code: https://github.com/tding1/CDFI\n\n**FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation**\n\n- Homepage: https://tarun005.github.io/FLAVR/\n\n- Paper: https://arxiv.org/abs/2012.08512\n- Code: https://github.com/tarun005/FLAVR\n\n<a name=\"Visual-Reasoning\"></a>\n\n# 视觉推理(Visual Reasoning)\n\n**Transformation Driven Visual Reasoning**\n\n- homepage: https://hongxin2019.github.io/TVR/\n- Paper: https://arxiv.org/abs/2011.13160\n- Code: https://github.com/hughplay/TVR\n\n<a name=\"Image-Synthesis\"></a>\n\n# 图像合成(Image Synthesis)\n\n**GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields**\n\n- Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html\n- Paper(Oral): https://arxiv.org/abs/2011.12100\n\n- Code: https://github.com/autonomousvision/giraffe\n\n- Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1\n\n**Taming Transformers for High-Resolution Image Synthesis**\n\n- Homepage: https://compvis.github.io/taming-transformers/\n- Paper(Oral): https://arxiv.org/abs/2012.09841\n- Code: https://github.com/CompVis/taming-transformers\n\n<a name=\"Visual-Synthesis\"></a>\n\n# 视图合成(View Synthesis)\n\n**Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes**\n\n- Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/\n- Paper: https://arxiv.org/abs/2104.06935\n\n**Self-Supervised Visibility Learning for Novel View Synthesis**\n\n- Paper: https://arxiv.org/abs/2103.15407\n- Code: None\n\n**NeX: Real-time View Synthesis with Neural Basis Expansion**\n\n- Homepage: https://nex-mpi.github.io/\n- Paper(Oral): https://arxiv.org/abs/2103.05606\n\n<a name=\"Style-Transfer\"></a>\n\n# 风格迁移(Style Transfer)\n\n**Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer**\n\n- Paper: https://arxiv.org/abs/2104.05376\n- Code: https://github.com/PaddlePaddle/PaddleGAN/\n\n<a name=\"Layout-Generation\"></a>\n\n# 布局生成(Layout Generation)\n\n**LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity**\n\n- Paper: None\n- Code: None\n\n**Variational Transformer Networks for Layout Generation**\n\n- Paper: https://arxiv.org/abs/2104.02416\n- Code: None\n\n<a name=\"Domain-Generalization\"></a>\n\n# Domain Generalization\n\n**Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections**\n\n- Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/papers/Pandey_Generalization_on_Unseen_Domains_via_Inference-Time_Label-Preserving_Target_Projections_CVPR_2021_paper.pdf\n- Code: https://github.com/VSumanth99/InferenceTimeDG\n\n**Generalizable Person Re-identification with Relevance-aware Mixture of Experts**\n\n- Paper: https://arxiv.org/abs/2105.09156\n- Code: None\n\n**RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening**\n\n- Paper: https://arxiv.org/abs/2103.15597\n- Code: https://github.com/shachoi/RobustNet\n\n**Adaptive Methods for Real-World Domain Generalization**\n\n- Paper: https://arxiv.org/abs/2103.15796\n- Code: None\n\n**FSDR: Frequency Space Domain Randomization for Domain Generalization**\n\n- Paper: https://arxiv.org/abs/2103.02370\n- Code: None\n\n<a name=\"Domain-Adaptation\"></a>\n\n# Domain Adaptation\n\n**Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation**\n\n- Paper: https://arxiv.org/abs/2104.00808\n- Code: None\n\n**Domain Consensus Clustering for Universal Domain Adaptation**\n\n- Paper: http://reler.net/papers/guangrui_cvpr2021.pdf\n- Code: https://github.com/Solacex/Domain-Consensus-Clustering \n\n<a name=\"Open-Set\"></a>\n\n# Open-Set\n\n**Towards Open World Object Detection**\n\n- Paper(Oral): https://arxiv.org/abs/2103.02603\n- Code: https://github.com/JosephKJ/OWOD\n\n**Exemplar-Based Open-Set Panoptic Segmentation Network**\n\n- Homepage: https://cv.snu.ac.kr/research/EOPSN/\n- Paper: https://arxiv.org/abs/2105.08336\n- Code: https://github.com/jd730/EOPSN\n\n**Learning Placeholders for Open-Set Recognition**\n\n- Paper(Oral): https://arxiv.org/abs/2103.15086\n- Code: None\n\n<a name=\"Adversarial-Attack\"></a>\n\n# Adversarial Attack\n\n**IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking**\n\n- Paper: https://arxiv.org/abs/2103.14938\n- Code: https://github.com/VISION-SJTU/IoUattack\n\n<a name=\"HOI\"></a>\n\n# \"人-物\"交互(HOI)检测\n\n**HOTR: End-to-End Human-Object Interaction Detection with Transformers**\n\n- Paper: https://arxiv.org/abs/2104.13682\n- Code: None\n\n**Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information**\n\n- Paper: https://arxiv.org/abs/2103.05399\n- Code: https://github.com/hitachi-rd-cv/qpic\n\n**Reformulating HOI Detection as Adaptive Set Prediction**\n\n- Paper: https://arxiv.org/abs/2103.05983\n- Code: https://github.com/yoyomimi/AS-Net\n\n**Detecting Human-Object Interaction via Fabricated Compositional Learning**\n\n- Paper: https://arxiv.org/abs/2103.08214\n- Code: https://github.com/zhihou7/FCL\n\n**End-to-End Human Object Interaction Detection with HOI Transformer**\n\n- Paper: https://arxiv.org/abs/2103.04503\n- Code: https://github.com/bbepoch/HoiTransformer\n\n<a name=\"Shadow-Removal\"></a>\n\n# 阴影去除(Shadow Removal)\n\n**Auto-Exposure Fusion for Single-Image Shadow Removal**\n\n- Paper: https://arxiv.org/abs/2103.01255\n- Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal\n\n<a name=\"Virtual-Try-On\"></a>\n\n# 虚拟换衣(Virtual Try-On)\n\n**Parser-Free Virtual Try-on via Distilling Appearance Flows**\n\n**基于外观流蒸馏的无需人体解析的虚拟换装**\n\n- Paper: https://arxiv.org/abs/2103.04559\n- Code: https://github.com/geyuying/PF-AFN \n\n<a name=\"Label-Noise\"></a>\n\n# 标签噪声(Label Noise)\n\n**A Second-Order Approach to Learning with Instance-Dependent Label Noise**\n\n- Paper(Oral): https://arxiv.org/abs/2012.11854\n- Code: https://github.com/UCSC-REAL/CAL\n\n<a name=\"Video-Stabilization\"></a>\n\n# 视频稳像(Video Stabilization)\n\n**Real-Time Selfie Video Stabilization**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf\n\n- Code: https://github.com/jiy173/selfievideostabilization\n\n<a name=\"Datasets\"></a>\n\n# 数据集(Datasets)\n\n**Tracking Pedestrian Heads in Dense Crowd**\n\n- Homepage: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Sundararaman_Tracking_Pedestrian_Heads_in_Dense_Crowd_CVPR_2021_paper.html\n- Code1: https://github.com/Sentient07/HeadHunter\n- Code2: https://github.com/Sentient07/HeadHunter%E2%80%93T\n- Dataset: https://project.inria.fr/crowdscience/project/dense-crowd-head-tracking/\n\n**Part-aware Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2106.06351\n- Code: https://github.com/tue-mps/panoptic_parts\n- Dataset: https://github.com/tue-mps/panoptic_parts\n\n**Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos**\n\n- Homepage: https://www.yasamin.page/hdnet_tiktok\n\n- Paper(Oral): https://arxiv.org/abs/2103.03319\n\n- Code: https://github.com/yasaminjafarian/HDNet_TikTok\n\n- Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v\n\n**High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network**\n\n- Paper: https://arxiv.org/abs/2105.09188\n- Code: https://github.com/csjliang/LPTN\n- Dataset: https://github.com/csjliang/LPTN\n\n**Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark**\n\n- Paper: https://arxiv.org/abs/2105.02440\n\n- Code: https://github.com/VisDrone/DroneCrowd\n\n- Dataset: https://github.com/VisDrone/DroneCrowd\n\n**Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets**\n\n- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/\n- Paper(Oral): https://arxiv.org/abs/2104.12690\n- Code: https://github.com/fidler-lab/efficient-annotation-cookbook\n\n论文下载链接：\n\n**ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation**\n\n- Paper: https://arxiv.org/abs/2012.05258\n- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab\n- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab\n\n**Learning To Count Everything**\n\n- Paper: https://arxiv.org/abs/2104.08391\n- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything\n- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything\n\n**Semantic Image Matting**\n\n- Paper: https://arxiv.org/abs/2104.08201\n- Code: https://github.com/nowsyn/SIM\n- Dataset: https://github.com/nowsyn/SIM\n\n**Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline**\n\n- Homepage: http://mepro.bjtu.edu.cn/resource.html\n- Paper: https://arxiv.org/abs/2104.06174\n- Code: None\n\n**Visual Semantic Role Labeling for Video Understanding**\n\n- Homepage: https://vidsitu.org/\n\n- Paper: https://arxiv.org/abs/2104.00990\n- Code: https://github.com/TheShadow29/VidSitu\n- Dataset: https://github.com/TheShadow29/VidSitu\n\n**VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild**\n\n- Homepage: https://www.vspwdataset.com/\n- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf\n- GitHub: https://github.com/sssdddwww2/vspw_dataset_download\n\n**Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark**\n\n- Homepage: https://vap.aau.dk/sewer-ml/\n- Paper: https://arxiv.org/abs/2103.10619\n\n**Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark**\n\n- Homepage: https://vap.aau.dk/sewer-ml/\n\n- Paper: https://arxiv.org/abs/2103.10895\n\n**Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food**\n\n- Paper: https://arxiv.org/abs/2103.03375\n- Dataset: None\n\n **Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges**\n\n- Homepage: https://github.com/QingyongHu/SensatUrban\n- Paper: http://arxiv.org/abs/2009.03137\n- Code: https://github.com/QingyongHu/SensatUrban\n- Dataset: https://github.com/QingyongHu/SensatUrban\n\n**When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework**\n\n- Paper(Oral): https://arxiv.org/abs/2103.01520\n- Code: https://github.com/Hzzone/MTLFace\n- Dataset: https://github.com/Hzzone/MTLFace\n\n**Depth from Camera Motion and Object Detection**\n\n- Paper: https://arxiv.org/abs/2103.01468\n- Code: https://github.com/griffbr/ODMD\n- Dataset: https://github.com/griffbr/ODMD\n\n**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**\n\n- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill\n- Paper: https://arxiv.org/abs/2103.01353\n- Code: http://rl.uni-freiburg.de/research/multimodal-distill\n\n**Scan2Cap: Context-aware Dense Captioning in RGB-D Scans**\n\n- Paper: https://arxiv.org/abs/2012.02206\n- Code: https://github.com/daveredrum/Scan2Cap\n\n- Dataset: https://github.com/daveredrum/ScanRefer\n\n**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**\n\n- Paper: https://arxiv.org/abs/2103.01353\n- Code: http://rl.uni-freiburg.de/research/multimodal-distill\n- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill\n\n<a name=\"Others\"></a>\n\n# 其他(Others)\n\n**Fast and Accurate Model Scaling**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html\n\n- Code: https://github.com/facebookresearch/pycls\n\n**Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos**\n\n- Homepage: https://www.yasamin.page/hdnet_tiktok\n\n- Paper(Oral): https://arxiv.org/abs/2103.03319\n\n- Code: https://github.com/yasaminjafarian/HDNet_TikTok\n\n- Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v\n\n**Omnimatte: Associating Objects and Their Effects in Video**\n\n- Homepage: https://omnimatte.github.io/\n\n- Paper(Oral): https://arxiv.org/abs/2105.06993\n- Code: https://omnimatte.github.io/#code\n\n**Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets**\n\n- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/\n- Paper(Oral): https://arxiv.org/abs/2104.12690\n- Code: https://github.com/fidler-lab/efficient-annotation-cookbook\n\n**Motion Representations for Articulated Animation**\n\n- Paper: https://arxiv.org/abs/2104.11280\n- Code: https://github.com/snap-research/articulated-animation\n\n**Deep Lucas-Kanade Homography for Multimodal Image Alignment**\n\n- Paper: https://arxiv.org/abs/2104.11693\n- Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography\n\n**Skip-Convolutions for Efficient Video Processing**\n\n- Paper: https://arxiv.org/abs/2104.11487\n- Code: None\n\n**KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control**\n\n- Homepage: http://tomasjakab.github.io/KeypointDeformer\n\n- Paper(Oral): https://arxiv.org/abs/2104.11224\n- Code: https://github.com/tomasjakab/keypoint_deformer/\n\n**Learning To Count Everything**\n\n- Paper: https://arxiv.org/abs/2104.08391\n- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything\n- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything\n\n**SOLD2: Self-supervised Occlusion-aware Line Description and Detection**\n\n- Paper(Oral): https://arxiv.org/abs/2104.03362\n- Code: https://github.com/cvg/SOLD2\n\n**Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression**\n\n- Homepage: https://li-wanhua.github.io/POEs/\n- Paper:  https://arxiv.org/abs/2103.13629\n- Code: https://github.com/Li-Wanhua/POEs\n\n**LEAP: Learning Articulated Occupancy of People**\n\n- Paper: https://arxiv.org/abs/2104.06849\n- Code: None\n\n**Visual Semantic Role Labeling for Video Understanding**\n\n- Homepage: https://vidsitu.org/\n\n- Paper: https://arxiv.org/abs/2104.00990\n- Code: https://github.com/TheShadow29/VidSitu\n- Dataset: https://github.com/TheShadow29/VidSitu\n\n**UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles**\n\n- Paper: https://arxiv.org/abs/2104.00946\n- Code: https://github.com/SUTDCV/UAV-Human \n\n**Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning**\n\n- Paper(Oral): https://arxiv.org/abs/2104.00924\n- Code: None\n\n**Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction**\n\n- Paper: https://arxiv.org/abs/2104.00858\n- Code: None\n\n**Towards High Fidelity Face Relighting with Realistic Shadows**\n\n- Paper: https://arxiv.org/abs/2104.00825\n- Code: None\n\n**BRepNet: A topological message passing system for solid models**\n\n- Paper(Oral): https://arxiv.org/abs/2104.00706\n- Code: None\n\n**Visually Informed Binaural Audio Generation without Binaural Audios**\n\n- Homepage: https://sheldontsui.github.io/projects/PseudoBinaural\n- Paper: None\n\n- GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021\n- Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc\n\n**Exploring intermediate representation for monocular vehicle pose estimation**\n\n- Paper: None\n- Code: https://github.com/Nicholasli1995/EgoNet\n\n**Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB**\n\n- Paper(Oral): https://arxiv.org/abs/2103.14708\n- Code: None\n\n**Invertible Image Signal Processing**\n\n- Paper: https://arxiv.org/abs/2103.15061\n- Code: https://github.com/yzxing87/Invertible-ISP\n\n**Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling**\n\n- Paper: https://arxiv.org/abs/2103.14858\n- Code: None\n\n**SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences**\n\n- Paper: https://arxiv.org/abs/2103.14898\n- Code: None\n\n**Embedding Transfer with Label Relaxation for Improved Metric Learning**\n\n- Paper: https://arxiv.org/abs/2103.14908\n- Code: None\n\n**Picasso: A CUDA-based Library for Deep Learning over 3D Meshes**\n\n- Paper: https://arxiv.org/abs/2103.15076 \n- Code: https://github.com/hlei-ziyan/Picasso\n\n**Meta-Mining Discriminative Samples for Kinship Verification**\n\n- Paper: https://arxiv.org/abs/2103.15108\n- Code: None\n\n**Cloud2Curve: Generation and Vectorization of Parametric Sketches**\n\n- Paper: https://arxiv.org/abs/2103.15536\n- Code: None\n\n**TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events**\n\n- Paper: https://arxiv.org/abs/2103.15538\n- Code: https://github.com/SUTDCV/SUTD-TrafficQA\n\n**Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution**\n\n- Homepage: http://wellyzhang.github.io/project/prae.html\n\n- Paper: https://arxiv.org/abs/2103.14230\n- Code: None\n\n**ACRE: Abstract Causal REasoning Beyond Covariation**\n\n- Homepage: http://wellyzhang.github.io/project/acre.html\n\n- Paper: https://arxiv.org/abs/2103.14232\n- Code: None\n\n**Confluent Vessel Trees with Accurate Bifurcations**\n\n- Paper: https://arxiv.org/abs/2103.14268\n- Code: None\n\n**Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling**\n\n- Paper: https://arxiv.org/abs/2103.14338\n- Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer\n\n**Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks**\n\n- Homepage: https://paschalidoud.github.io/neural_parts\n- Paper: None \n- Code: https://github.com/paschalidoud/neural_parts \n\n**Knowledge Evolution in Neural Networks**\n\n- Paper(Oral): https://arxiv.org/abs/2103.05152\n- Code: https://github.com/ahmdtaha/knowledge_evolution\n\n**Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning**\n\n- Paper: https://arxiv.org/abs/2103.02148\n- Code: https://github.com/guopengf/FLMRCM\n\n**SGP: Self-supervised Geometric Perception**\n\n- Oral\n\n- Paper: https://arxiv.org/abs/2103.03114\n- Code: https://github.com/theNded/SGP\n\n**Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning**\n\n- Paper: https://arxiv.org/abs/2103.02148\n- Code: https://github.com/guopengf/FLMRCM\n\n**Diffusion Probabilistic Models for 3D Point Cloud Generation**\n\n- Paper: https://arxiv.org/abs/2103.01458\n- Code: https://github.com/luost26/diffusion-point-cloud\n\n**Scan2Cap: Context-aware Dense Captioning in RGB-D Scans**\n\n- Paper: https://arxiv.org/abs/2012.02206\n- Code: https://github.com/daveredrum/Scan2Cap\n\n- Dataset: https://github.com/daveredrum/ScanRefer\n\n**There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge**\n\n- Paper: https://arxiv.org/abs/2103.01353\n- Code: http://rl.uni-freiburg.de/research/multimodal-distill\n\n- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill\n\n<a name=\"TO-DO\"></a>\n\n# 待添加(TODO)\n\n- [重磅！腾讯优图20篇论文入选CVPR 2021](https://mp.weixin.qq.com/s/McAtOVh0osWZ3uppEoHC8A)\n- [MePro团队三篇论文被CVPR 2021接收](https://mp.weixin.qq.com/s/GD5Zb6u_MQ8GZIAGeCGo3Q)\n\n<a name=\"Not-Sure\"></a>\n\n# 不确定中没中(Not Sure)\n\n**CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models**\n\n- Paper: none\n- Code: https://github.com/transcendentsky/Film-Recovery\n\n**Toward Explainable Reflection Removal with Distilling and Model Uncertainty**\n\n- Paper: none\n- Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty\n\n**DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation**\n\n- Paper: none\n- Code: https://github.com/lhaippp/DeepOIS\n\n**Exploring Adversarial Fake Images on Face Manifold**\n\n- Paper: none\n- Code: https://github.com/ldz666666/Style-atk\n\n**Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task**\n\n- Paper: none\n- Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task\n\n**Temporal Contrastive Graph for Self-supervised Video Representation Learning**\n\n- Paper: none\n- Code: https://github.com/YangLiu9208/TCG\n\n**Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching**\n\n- Paper: none\n- Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr\n\n**Fast and Memory-Efficient Compact Bilinear Pooling**\n\n- Paper: none\n- Code: https://github.com/cvpr2021kp2/cvpr2021kp2\n\n**Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine**\n\n- Paper: none\n- Code: https://github.com/gapDetection/cvpr2021\n\n **Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation** \n\n- Paper: none\n- Code: https://github.com/interactivekeypoint2020/Morph\n\nhttps://github.com/ShaoQiangShen/CVPR2021\n\nhttps://github.com/gillesflash/CVPR2021\n\nhttps://github.com/anonymous-submission1991/BaLeNAS\n\nhttps://github.com/cvpr2021dcb/cvpr2021dcb\n\nhttps://github.com/anonymousauthorCV/CVPR2021_PaperID_8578\n\nhttps://github.com/AldrichZeng/FreqPrune\n\nhttps://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM\n\nhttps://github.com/ddfss/datadrive-fss\n\n"
  },
  {
    "path": "CVPR2022-Papers-with-Code.md",
    "content": "# CVPR 2022 论文和开源项目合集(Papers with Code)\n\n[CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)！\n\nCVPR 2022 收录列表ID：https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view\n\n> 注1：欢迎各位大佬提交issue，分享CVPR 2022论文和开源项目！\n>\n> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n>\n> - [CVPR 2019](CVPR2019-Papers-with-Code.md)\n> - [CVPR 2020](CVPR2020-Papers-with-Code.md)\n> - [CVPR 2021](CVPR2021-Papers-with-Code.md)\n\n如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ \n\n![](CVer学术交流群.png)\n\n## 【CVPR 2022 论文开源目录】\n\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [MLP](#MLP)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [3D Face](#3D Face)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Visual Transformer](#Visual-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [目标检测(Object Detection)](#Object-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [小样本分类(Few-Shot Classification)](#FFC)\n- [小样本分割(Few-Shot Segmentation)](#FFS)\n- [图像抠图(Image Matting)](#Matting)\n- [视频理解(Video Understanding)](#VU)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#Super-Resolution)\n- [去模糊(Deblur)](#Deblur)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3D-Object-Detection)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D重建(3D Reconstruction)](#3D-R)\n- [行人重识别(Person Re-identification)](#ReID)\n- [伪装物体检测(Camouflaged Object Detection)](#COD)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#FM)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation)\n- [图像修复(Image Inpainting)](#Image-Inpainting)\n- [图像检索(Image Retrieval)](#Image-Retrieval)\n- [人脸识别(Face Recognition)](#Face-Recognition)\n- [人群计数(Crowd Counting)](#Crowd-Counting)\n- [医学图像(Medical Image)](#Medical-Image)\n- [视频生成(Video Generation)](#Video Generation)\n- [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)\n- [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS)\n- [步态识别(Gait Recognition)](#GR)\n- [风格迁移(Style Transfer)](#ST)\n- [异常检测(Anomaly Detection](#AD)\n- [对抗样本(Adversarial Examples)](#AE)\n- [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL)\n- [雷达目标检测(Radar Object Detection)](#ROD)\n- [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI)\n- [图像拼接(Image Stitching)](#Image-Stitching)\n- [水印(Watermarking)](#Watermarking)\n- [Action Counting](#AC)\n- [Grounded Situation Recognition](#GSR)\n- [Zero-shot Learning](#ZSL)\n- [DeepFakes](#DeepFakes)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n<a name=\"Backbone\"></a>\n\n# Backbone\n\n**A ConvNet for the 2020s**\n\n- Paper: https://arxiv.org/abs/2201.03545\n- Code: https://github.com/facebookresearch/ConvNeXt\n- 中文解读：https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw\n\n**Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs**\n\n- Paper: https://arxiv.org/abs/2203.06717\n\n- Code: https://github.com/megvii-research/RepLKNet\n- Code2: https://github.com/DingXiaoH/RepLKNet-pytorch\n\n- 中文解读：https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg\n\n**MPViT : Multi-Path Vision Transformer for Dense Prediction**\n\n- Paper: https://arxiv.org/abs/2112.11010\n- Code: https://github.com/youngwanLEE/MPViT\n- 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg\n\n**Mobile-Former: Bridging MobileNet and Transformer**\n\n- Paper: https://arxiv.org/abs/2108.05895\n- Code: None\n- 中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ\n\n**MetaFormer is Actually What You Need for Vision**\n\n- Paper: https://arxiv.org/abs/2111.11418\n- Code: https://github.com/sail-sg/poolformer\n\n**Shunted Self-Attention via Multi-Scale Token Aggregation**\n\n-  Paper(Oral): https://arxiv.org/abs/2111.15193\n- Code: https://github.com/OliverRensu/Shunted-Transformer\n\n**TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing**\n\n- Paper: http://arxiv.org/abs/2203.10489\n- Code: https://github.com/JierunChen/TVConv\n\n**Learned Queries for Efficient Local Attention**\n\n- Paper(Oral): https://arxiv.org/abs/2112.11435\n- Code: https://github.com/moabarar/qna\n\n**RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**\n\n- Paper: https://arxiv.org/abs/2112.11081\n- Code: https://github.com/DingXiaoH/RepMLP\n\n<a name=\"CLIP\"></a>\n\n# CLIP\n\n**HairCLIP: Design Your Hair by Text and Reference Image**\n\n- Paper: https://arxiv.org/abs/2112.05142\n\n- Code: https://github.com/wty-ustc/HairCLIP\n\n**PointCLIP: Point Cloud Understanding by CLIP**\n\n- Paper: https://arxiv.org/abs/2112.02413\n- Code: https://github.com/ZrrSkywalker/PointCLIP\n\n**Blended Diffusion for Text-driven Editing of Natural Images**\n\n- Paper: https://arxiv.org/abs/2111.14818\n\n- Code: https://github.com/omriav/blended-diffusion\n\n<a name=\"GAN\"></a>\n\n# GAN\n\n**SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**\n\n- Homepage: https://semanticstylegan.github.io/\n\n- Paper: https://arxiv.org/abs/2112.02236\n- Demo: https://semanticstylegan.github.io/videos/demo.mp4\n\n**Style Transformer for Image Inversion and Editing**\n\n- Paper: https://arxiv.org/abs/2203.07932\n- Code: https://github.com/sapphire497/style-transformer\n\n**Unsupervised Image-to-Image Translation with Generative Prior**\n\n- Homepage: https://www.mmlab-ntu.com/project/gpunit/\n- Paper: https://arxiv.org/abs/2204.03641\n- Code: https://github.com/williamyang1991/GP-UNIT\n\n**StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**\n\n- Homepage: https://universome.github.io/stylegan-v\n- Paper: https://arxiv.org/abs/2112.14683\n- Code: https://github.com/universome/stylegan-v\n\n**OSSGAN: Open-set Semi-supervised Image Generation**\n\n- Paper: https://arxiv.org/abs/2204.14249\n- Code: https://github.com/raven38/OSSGAN\n\n**Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2204.06160\n- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution\n\n<a name=\"GNN\"></a>\n\n# GNN\n\n**OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks**\n\n- Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf \n- Code: https://github.com/WanyuGroup/CVPR2022-OrphicX\n\n<a name=\"MLP\"></a>\n\n# MLP\n\n**RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**\n\n- Paper: https://arxiv.org/abs/2112.11081\n- Code: https://github.com/DingXiaoH/RepMLP\n\n<a name=\"NAS\"></a>\n\n# NAS\n\n**β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search**\n\n- Paper: https://arxiv.org/abs/2203.01665\n- Code: https://github.com/Sunshine-Ye/Beta-DARTS\n\n**ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**\n\n- Paper: https://arxiv.org/abs/2111.15362\n- Code: None\n\n<a name=\"OCR\"></a>\n\n# OCR\n\n**SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**\n\n- Paper: https://arxiv.org/abs/2203.10209\n\n- Code: https://github.com/mxin262/SwinTextSpotter\n\n<a name=\"NeRF\"></a>\n\n# NeRF\n\n**Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields**\n\n- Homepage: https://jonbarron.info/mipnerf360/\n- Paper: https://arxiv.org/abs/2111.12077\n\n- Demo: https://youtu.be/YStDS2-Ln1s\n\n**Point-NeRF: Point-based Neural Radiance Fields**\n\n- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/\n- Paper: https://arxiv.org/abs/2201.08845\n- Code: https://github.com/Xharlie/point-nerf\n\n**NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images**\n\n- Paper: https://arxiv.org/abs/2111.13679\n- Homepage: https://bmild.github.io/rawnerf/\n- Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc\n\n**Urban Radiance Fields**\n\n- Homepage: https://urban-radiance-fields.github.io/\n\n- Paper: https://arxiv.org/abs/2111.14643\n- Demo: https://youtu.be/qGlq5DZT6uc\n\n**Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation**\n\n- Paper: https://arxiv.org/abs/2202.13162\n- Code: https://github.com/HexagonPrime/Pix2NeRF\n\n**HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video**\n\n- Homepage: https://grail.cs.washington.edu/projects/humannerf/\n- Paper: https://arxiv.org/abs/2201.04127\n\n- Demo: https://youtu.be/GM-RoZEymmw\n\n<a name=\"3D Face\"></a>\n\n# 3D Face\n\n**ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations**\n\n- Paper: https://arxiv.org/abs/2203.14510\n- Code: https://github.com/MingwuZheng/ImFace \n\n<a name=\"Long-Tail\"></a>\n\n# 长尾分布(Long-Tail)\n\n**Retrieval Augmented Classification for Long-Tail Visual Recognition**\n\n- Paper: https://arxiv.org/abs/2202.11233\n- Code: None\n\n<a name=\"Visual-Transformer\"></a>\n\n# Visual Transformer\n\n## Backbone\n\n**MPViT : Multi-Path Vision Transformer for Dense Prediction**\n\n- Paper: https://arxiv.org/abs/2112.11010\n- Code: https://github.com/youngwanLEE/MPViT\n\n**MetaFormer is Actually What You Need for Vision**\n\n- Paper: https://arxiv.org/abs/2111.11418\n- Code: https://github.com/sail-sg/poolformer\n\n**Mobile-Former: Bridging MobileNet and Transformer**\n\n- Paper: https://arxiv.org/abs/2108.05895\n- Code: None\n- 中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ\n\n**Shunted Self-Attention via Multi-Scale Token Aggregation**\n\n-  Paper(Oral): https://arxiv.org/abs/2111.15193\n- Code: https://github.com/OliverRensu/Shunted-Transformer\n\n**Learned Queries for Efficient Local Attention**\n\n- Paper(Oral): https://arxiv.org/abs/2112.11435\n- Code: https://github.com/moabarar/qna\n\n## 应用(Application)\n\n**Language-based Video Editing via Multi-Modal Multi-Level Transformer**\n\n- Paper: https://arxiv.org/abs/2104.01122\n- Code: None\n\n**MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**\n\n- Paper: https://arxiv.org/abs/2203.00859\n- Code: None\n\n**Embracing Single Stride 3D Object Detector with Sparse Transformer**\n\n- Paper: https://arxiv.org/abs/2112.06375\n- Code: https://github.com/TuSimple/SST\n- 中文解读：https://zhuanlan.zhihu.com/p/476056546\n\n**Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.02891\n- Code: https://github.com/xulianuwa/MCTformer\n\n**Spatio-temporal Relation Modeling for Few-shot Action Recognition**\n\n- Paper: https://arxiv.org/abs/2112.05132\n- Code: https://github.com/Anirudh257/strm\n\n**Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**\n\n- Paper: https://arxiv.org/abs/2111.07910\n- Code: https://github.com/caiyuanhao1998/MST\n\n**Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**\n\n- Homepage: https://point-bert.ivg-research.xyz/\n- Paper: https://arxiv.org/abs/2111.14819\n- Code: https://github.com/lulutang0608/Point-BERT\n\n**GroupViT: Semantic Segmentation Emerges from Text Supervision**\n\n- Homepage: https://jerryxu.net/GroupViT/\n\n- Paper: https://arxiv.org/abs/2202.11094\n- Demo: https://youtu.be/DtJsWIUTW-Y\n\n**Restormer: Efficient Transformer for High-Resolution Image Restoration**\n\n- Paper: https://arxiv.org/abs/2111.09881\n- Code: https://github.com/swz30/Restormer\n\n**Splicing ViT Features for Semantic Appearance Transfer**\n\n- Homepage: https://splice-vit.github.io/\n- Paper: https://arxiv.org/abs/2201.00424\n- Code: https://github.com/omerbt/Splice\n\n**Self-supervised Video Transformer**\n\n- Homepage: https://kahnchana.github.io/svt/\n- Paper: https://arxiv.org/abs/2112.01514\n\n- Code: https://github.com/kahnchana/svt\n\n**Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**\n\n- Paper: https://arxiv.org/abs/2203.02664\n- Code: https://github.com/rulixiang/afa\n\n**Accelerating DETR Convergence via Semantic-Aligned Matching**\n\n- Paper: https://arxiv.org/abs/2203.06883\n- Code: https://github.com/ZhangGongjie/SAM-DETR\n\n**DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**\n\n- Paper: https://arxiv.org/abs/2203.01305\n- Code: https://github.com/FengLi-ust/DN-DETR\n- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w\n\n**Style Transformer for Image Inversion and Editing**\n\n- Paper: https://arxiv.org/abs/2203.07932\n- Code: https://github.com/sapphire497/style-transformer\n\n**MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**\n\n- Paper: https://arxiv.org/abs/2203.10981\n\n- Code: https://github.com/kuanchihhuang/MonoDTR\n\n**Mask Transfiner for High-Quality Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2111.13673\n- Code: https://github.com/SysCV/transfiner\n\n**Language as Queries for Referring Video Object Segmentation**\n\n- Paper: https://arxiv.org/abs/2201.00487\n- Code:  https://github.com/wjn922/ReferFormer\n- 中文解读：https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ\n\n**X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**\n\n- Paper: https://arxiv.org/abs/2203.00843\n- Code: https://github.com/CurryYuan/X-Trans2Cap\n\n**AdaMixer: A Fast-Converging Query-Based Object Detector**\n\n- Paper(Oral): https://arxiv.org/abs/2203.16507\n- Code: https://github.com/MCG-NJU/AdaMixer\n\n**Omni-DETR: Omni-Supervised Object Detection with Transformers**\n\n- Paper: https://arxiv.org/abs/2203.16089\n- Code: https://github.com/amazon-research/omni-detr\n\n**SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**\n\n- Paper: https://arxiv.org/abs/2203.10209\n\n- Code: https://github.com/mxin262/SwinTextSpotter\n\n**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**\n\n- Paper(Oral): https://arxiv.org/abs/2204.01018\n- Code: https://github.com/SvipRepetitionCounting/TransRAC\n\n**Collaborative Transformers for Grounded Situation Recognition**\n\n- Paper: https://arxiv.org/abs/2203.16518\n- Code: https://github.com/jhcho99/CoFormer\n\n**NFormer: Robust Person Re-identification with Neighbor Transformer**\n\n- Paper: https://arxiv.org/abs/2204.09331\n- Code: https://github.com/haochenheheda/NFormer\n\n**Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**\n\n- Paper: https://arxiv.org/abs/2201.06889\n- Code: None\n\n**Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**\n\n- Paper(Oral): https://arxiv.org/abs/2204.08680\n- Code: https://github.com/zengwang430521/TCFormer\n\n**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2204.10039\n- Code: https://github.com/H-deep/Trans-SVSR/\n- Dataset: http://shorturl.at/mpwGX\n\n**Safe Self-Refinement for Transformer-based Domain Adaptation**\n\n- Paper: https://arxiv.org/abs/2204.07683\n- Code: https://github.com/tsun/SSRT\n\n**Fast Point Transformer**\n\n- Homepage: http://cvlab.postech.ac.kr/research/FPT/\n- Paper: https://arxiv.org/abs/2112.04702\n- Code: https://github.com/POSTECH-CVLab/FastPointTransformer\n\n**Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval**\n\n- Paper: https://arxiv.org/abs/2204.09730\n- Code: https://github.com/mshukor/TFood\n\n**DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2111.14887\n- Code: https://github.com/lhoyer/DAFormer\n\n**Stratified Transformer for 3D Point Cloud Segmentation**\n\n- Paper: https://arxiv.org/pdf/2203.14508.pdf\n- Code: https://github.com/dvlab-research/Stratified-Transformer \n\n<a name=\"VL\"></a>\n\n# 视觉和语言(Vision-Language)\n\n**Conditional Prompt Learning for Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2203.05557\n- Code: https://github.com/KaiyangZhou/CoOp\n\n**Bridging Video-text Retrieval with Multiple Choice Question**\n\n- Paper: https://arxiv.org/abs/2201.04850\n- Code: https://github.com/TencentARC/MCQ\n\n**Visual Abductive Reasoning**\n\n- Paper: https://arxiv.org/abs/2203.14040\n- Code: https://github.com/leonnnop/VAR\n\n<a name=\"SSL\"></a>\n\n# 自监督学习(Self-supervised Learning)\n\n**UniVIP: A Unified Framework for Self-Supervised Visual Pre-training**\n\n- Paper: https://arxiv.org/abs/2203.06965\n- Code: None\n\n**Crafting Better Contrastive Views for Siamese Representation Learning**\n\n- Paper: https://arxiv.org/abs/2202.03278\n- Code: https://github.com/xyupeng/ContrastiveCrop\n- 中文解读：https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A\n\n**HCSC: Hierarchical Contrastive Selective Coding**\n\n- Homepage: https://github.com/gyfastas/HCSC\n- Paper: https://arxiv.org/abs/2202.00455\n- 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ\n\n**DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**\n\n- Paper: https://arxiv.org/abs/2204.10437\n\n- Code: https://github.com/JLiangLab/DiRA\n\n<a name=\"DA\"></a>\n\n# 数据增强(Data Augmentation)\n\n**TeachAugment: Data Augmentation Optimization Using Teacher Knowledge**\n\n- Paper: https://arxiv.org/abs/2202.12513\n- Code: https://github.com/DensoITLab/TeachAugment\n\n**AlignMixup: Improving Representations By Interpolating Aligned Features**\n\n- Paper: https://arxiv.org/abs/2103.15375\n- Code: https://github.com/shashankvkt/AlignMixup_CVPR22 \n\n<a name=\"KD\"></a>\n\n# 知识蒸馏(Knowledge Distillation)\n\n**Decoupled Knowledge Distillation**\n\n- Paper: https://arxiv.org/abs/2203.08679\n- Code: https://github.com/megvii-research/mdistiller\n- 中文解读：https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测(Object Detection)\n\n**BoxeR: Box-Attention for 2D and 3D Transformers**\n- Paper: https://arxiv.org/abs/2111.13087\n- Code: https://github.com/kienduynguyen/BoxeR\n- 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w\n\n**DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**\n\n- Paper: https://arxiv.org/abs/2203.01305\n- Code: https://github.com/FengLi-ust/DN-DETR\n- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w\n\n**Accelerating DETR Convergence via Semantic-Aligned Matching**\n\n- Paper: https://arxiv.org/abs/2203.06883\n- Code: https://github.com/ZhangGongjie/SAM-DETR\n\n**Localization Distillation for Dense Object Detection**\n\n- Paper: https://arxiv.org/abs/2102.12252\n- Code: https://github.com/HikariTJU/LD\n- Code2: https://github.com/HikariTJU/LD\n- 中文解读：https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg\n\n**Focal and Global Knowledge Distillation for Detectors**\n\n- Paper: https://arxiv.org/abs/2111.11837\n- Code: https://github.com/yzd-v/FGD\n- 中文解读：https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ\n\n**A Dual Weighting Label Assignment Scheme for Object Detection**\n\n- Paper: https://arxiv.org/abs/2203.09730\n- Code: https://github.com/strongwolf/DW\n\n**AdaMixer: A Fast-Converging Query-Based Object Detector**\n\n- Paper(Oral): https://arxiv.org/abs/2203.16507\n- Code: https://github.com/MCG-NJU/AdaMixer\n\n**Omni-DETR: Omni-Supervised Object Detection with Transformers**\n\n- Paper: https://arxiv.org/abs/2203.16089\n- Code: https://github.com/amazon-research/omni-detr\n\n**SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection**\n\n- Paper(Oral): https://arxiv.org/abs/2203.06398\n- Code: https://github.com/CityU-AIM-Group/SIGMA\n\n## 半监督目标检测\n\n**Dense Learning based Semi-Supervised Object Detection**\n\n- Paper: https://arxiv.org/abs/2204.07300\n\n- Code: https://github.com/chenbinghui1/DSL\n\n# 目标跟踪(Visual Tracking)\n\n**Correlation-Aware Deep Tracking**\n\n- Paper: https://arxiv.org/abs/2203.01666\n- Code: None\n\n**TCTrack: Temporal Contexts for Aerial Tracking**\n\n- Paper: https://arxiv.org/abs/2203.01885\n- Code: https://github.com/vision4robotics/TCTrack\n\n## 多模态目标跟踪\n\n**Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**\n\n- Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/\n\n- Paper: https://arxiv.org/abs/2204.04120\n\n## 多目标跟踪(Multi-Object Tracking)\n\n**Learning of Global Objective for Network Flow in Multi-Object Tracking**\n\n- Paper: https://arxiv.org/abs/2203.16210\n- Code: None\n\n**DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**\n\n- Homepage: https://dancetrack.github.io\n- Paper: https://arxiv.org/abs/2111.14690\n- Dataset: https://github.com/DanceTrack/DanceTrack\n\n<a name=\"Semantic-Segmentation\"></a>\n\n# 语义分割(Semantic Segmentation)\n\n**Novel Class Discovery in Semantic Segmentation**\n\n- Homepage: https://ncdss.github.io/\n- Paper: https://arxiv.org/abs/2112.01900\n- Code: https://github.com/HeliosZhao/NCDSS\n\n**Deep Hierarchical Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.14335\n- Code: https://github.com/0liliulei/HieraSeg \n\n**Rethinking Semantic Segmentation: A Prototype View**\n\n- Paper(Oral): https://arxiv.org/abs/2203.15102\n- Code: https://github.com/tfzhou/ProtoSeg\n\n## 弱监督语义分割\n\n**Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.00962\n- Code: https://github.com/zhaozhengChen/ReCAM\n\n**Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.02891\n- Code: https://github.com/xulianuwa/MCTformer\n\n**Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**\n\n- Paper: https://arxiv.org/abs/2203.02664\n- Code: https://github.com/rulixiang/afa\n\n**CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.02668\n- Code: https://github.com/CVI-SZU/CLIMS \n\n**CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.13505\n- Code: https://github.com/CVI-SZU/CCAM \n\n**FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation**\n\n- Homeapage: http://cvlab.postech.ac.kr/research/FIFO/\n- Paper(Oral): https://arxiv.org/abs/2204.01587\n- Code: https://github.com/sohyun-l/FIFO \n\n**Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.09653\n- Code: https://github.com/maeve07/RCA.git\n\n## 半监督语义分割\n\n**ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2106.05095\n- Code: https://github.com/LiheYoung/ST-PlusPlus\n- 中文解读：https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA\n\n**Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels**\n\n- Homepage: https://haochen-wang409.github.io/U2PL/\n- Paper: https://arxiv.org/abs/2203.03884\n- Code: https://github.com/Haochen-Wang409/U2PL\n- 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ\n\n**Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation**\n\n- Paper: https://arxiv.org/pdf/2111.12903.pdf\n- Code: https://github.com/yyliu01/PS-MT\n\n## 域自适应语义分割\n\n**Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2111.12940\n- Code: https://github.com/BIT-DA/RIPU\n\n**DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2111.14887\n- Code: https://github.com/lhoyer/DAFormer\n\n## 无监督语义分割\n\n**GroupViT: Semantic Segmentation Emerges from Text Supervision**\n\n- Homepage: https://jerryxu.net/GroupViT/\n- Paper: https://arxiv.org/abs/2202.11094\n- Demo: https://youtu.be/DtJsWIUTW-Y\n\n## 少样本语义分割\n\n**Generalized Few-shot Semantic Segmentation**\n\n- Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf\n- Code: https://github.com/dvlab-research/GFS-Seg \n\n<a name=\"Instance-Segmentation\"></a>\n\n# 实例分割(Instance Segmentation)\n\n**BoxeR: Box-Attention for 2D and 3D Transformers**\n- Paper: https://arxiv.org/abs/2111.13087\n- Code: https://github.com/kienduynguyen/BoxeR\n- 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w\n\n**E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.04074\n- Code: https://github.com/zhang-tao-whu/e2ec\n\n**Mask Transfiner for High-Quality Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2111.13673\n- Code: https://github.com/SysCV/transfiner\n\n**Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity**\n\n- Homepage: https://sites.google.com/view/generic-grouping/\n\n- Paper: https://arxiv.org/abs/2204.06107\n- Code: https://github.com/facebookresearch/Generic-Grouping\n\n## 自监督实例分割\n\n**FreeSOLO: Learning to Segment Objects without Annotations**\n\n- Paper: https://arxiv.org/abs/2202.12181\n- Code: https://github.com/NVlabs/FreeSOLO\n\n## 视频实例分割\n\n**Efficient Video Instance Segmentation via Tracklet Query and Proposal**\n\n- Homepage: https://jialianwu.com/projects/EfficientVIS.html\n- Paper: https://arxiv.org/abs/2203.01853\n- Demo: https://youtu.be/sSPMzgtMKCE\n\n**Temporally Efficient Vision Transformer for Video Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2204.08412\n- Code: https://github.com/hustvl/TeViT\n\n<a name=\"Panoptic-Segmentation\"></a>\n\n# 全景分割(Panoptic Segmentation)\n\n**Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers**\n\n- Paper: https://arxiv.org/abs/2109.03814\n- Code: https://github.com/zhiqi-li/Panoptic-SegFormer\n\n**Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**\n\n- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf\n- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset\n- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset \n\n<a name=\"FFC\"></a>\n\n# 小样本分类(Few-Shot Classification)\n\n**Integrative Few-Shot Learning for Classification and Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.15712\n- Code: https://github.com/dahyun-kang/ifsl\n\n**Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification**\n\n- Paper: https://arxiv.org/abs/2106.05517\n- Code: https://github.com/LouieYang/MCL\n\n<a name=\"FFS\"></a>\n\n# 小样本分割(Few-Shot Segmentation)\n\n**Learning What Not to Segment: A New Perspective on Few-Shot Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.07615\n- Code: https://github.com/chunbolang/BAM\n\n**Integrative Few-Shot Learning for Classification and Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.15712\n- Code: https://github.com/dahyun-kang/ifsl\n\n**Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2204.10638\n- Code: None\n\n<a name=\"Matting\"></a>\n\n# 图像抠图(Image Matting)\n\n**Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**\n\n- Paper: https://arxiv.org/abs/2201.06889\n- Code: None\n\n<a name=\"VU\"></a>\n\n# 视频理解(Video Understanding)\n\n**Self-supervised Video Transformer**\n\n- Homepage: https://kahnchana.github.io/svt/\n- Paper: https://arxiv.org/abs/2112.01514\n- Code: https://github.com/kahnchana/svt\n\n**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**\n\n- Paper(Oral): https://arxiv.org/abs/2204.01018\n- Code: https://github.com/SvipRepetitionCounting/TransRAC\n\n**FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**\n\n- Paper(Oral): https://arxiv.org/abs/2204.03646\n\n- Dataset: https://github.com/xujinglin/FineDiving\n- Code: https://github.com/xujinglin/FineDiving\n- 中文解读：https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg\n\n**Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition**\n\n- Paper(Oral): https://arxiv.org/abs/2204.02148\n- Code: None\n\n## 行为识别(Action Recognition)\n\n**Spatio-temporal Relation Modeling for Few-shot Action Recognition**\n\n- Paper: https://arxiv.org/abs/2112.05132\n- Code: https://github.com/Anirudh257/strm\n\n## 动作检测(Action Detection)\n\n**End-to-End Semi-Supervised Learning for Video Action Detection**\n\n- Paper: https://arxiv.org/abs/2203.04251\n- Code: None\n\n<a name=\"Image-Editing\"></a>\n\n# 图像编辑(Image Editing)\n\n**Style Transformer for Image Inversion and Editing**\n\n- Paper: https://arxiv.org/abs/2203.07932\n- Code: https://github.com/sapphire497/style-transformer\n\n**Blended Diffusion for Text-driven Editing of Natural Images**\n\n- Paper: https://arxiv.org/abs/2111.14818\n- Code: https://github.com/omriav/blended-diffusion\n\n**SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**\n\n- Homepage: https://semanticstylegan.github.io/\n\n- Paper: https://arxiv.org/abs/2112.02236\n- Demo: https://semanticstylegan.github.io/videos/demo.mp4\n\n<a name=\"LLV\"></a>\n\n# Low-level Vision\n\n**ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**\n\n- Paper: https://arxiv.org/abs/2111.15362\n- Code: None\n\n**Restormer: Efficient Transformer for High-Resolution Image Restoration**\n\n- Paper: https://arxiv.org/abs/2111.09881\n- Code: https://github.com/swz30/Restormer\n\n**Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements**\n\n- Paper(Oral): https://arxiv.org/abs/2111.12855\n- Code: https://github.com/edongdongchen/REI\n\n<a name=\"Super-Resolution\"></a>\n\n# 超分辨率(Super-Resolution)\n\n## 图像超分辨率(Image Super-Resolution)\n\n**Learning the Degradation Distribution for Blind Image Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2203.04962\n- Code: https://github.com/greatlog/UnpairedSR\n\n## 视频超分辨率(Video Super-Resolution)\n\n**BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment**\n\n- Paper: https://arxiv.org/abs/2104.13371\n- Code: https://github.com/open-mmlab/mmediting\n- Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus\n- 中文解读：https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g\n\n**Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling**\n\n- Paper: https://arxiv.org/abs/2204.07114\n- Code: None\n\n**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2204.10039\n- Code: https://github.com/H-deep/Trans-SVSR/\n- Dataset: http://shorturl.at/mpwGX\n\n<a name=\"Deblur\"></a>\n\n# 去模糊(Deblur)\n\n## 图像去模糊(Image Deblur)\n\n**Learning to Deblur using Light Field Generated and Real Defocus Images**\n\n- Homepage: http://lyruan.com/Projects/DRBNet/\n- Paper(Oral): https://arxiv.org/abs/2204.00442\n\n- Code: https://github.com/lingyanruan/DRBNet\n\n<a name=\"3D-Point-Cloud\"></a>\n\n# 3D点云(3D Point Cloud)\n\n**Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**\n\n- Homepage: https://point-bert.ivg-research.xyz/\n\n- Paper: https://arxiv.org/abs/2111.14819\n- Code: https://github.com/lulutang0608/Point-BERT\n\n**A Unified Query-based Paradigm for Point Cloud Understanding**\n\n- Paper: https://arxiv.org/abs/2203.01252\n- Code: None \n\n**CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding**\n\n- Paper: https://arxiv.org/abs/2203.00680\n- Code: https://github.com/MohamedAfham/CrossPoint\n\n**PointCLIP: Point Cloud Understanding by CLIP**\n\n- Paper: https://arxiv.org/abs/2112.02413\n- Code: https://github.com/ZrrSkywalker/PointCLIP\n\n**Fast Point Transformer**\n\n- Homepage: http://cvlab.postech.ac.kr/research/FPT/\n- Paper: https://arxiv.org/abs/2112.04702\n- Code: https://github.com/POSTECH-CVLab/FastPointTransformer\n\n**RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds**\n\n- Paper: https://arxiv.org/abs/2205.11028\n- Code: https://github.com/gxd1994/RCP\n\n**The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution**\n\n- Paper: https://arxiv.org/abs/2205.15210\n- Code: https://github.com/GostInShell/PaRI-Conv \n\n<a name=\"3D-Object-Detection\"></a>\n\n# 3D目标检测(3D Object Detection)\n\n**Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds**\n\n- Paper(Oral): https://arxiv.org/abs/2203.11139\n\n- Code: https://github.com/yifanzhang713/IA-SSD\n\n- Demo: https://www.youtube.com/watch?v=3jP2o9KXunA\n\n**BoxeR: Box-Attention for 2D and 3D Transformers**\n- Paper: https://arxiv.org/abs/2111.13087\n- Code: https://github.com/kienduynguyen/BoxeR\n- 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w\n\n**Embracing Single Stride 3D Object Detector with Sparse Transformer**\n\n- Paper: https://arxiv.org/abs/2112.06375\n\n- Code: https://github.com/TuSimple/SST\n\n**Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes** \n\n- Paper: https://arxiv.org/abs/2011.12001\n- Code: https://github.com/qq456cvb/CanonicalVoting\n\n**MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**\n\n- Paper: https://arxiv.org/abs/2203.10981\n- Code: https://github.com/kuanchihhuang/MonoDTR\n\n**HyperDet3D: Learning a Scene-conditioned 3D Object Detector**\n\n- Paper: https://arxiv.org/abs/2204.05599\n- Code: None\n\n**OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data**\n\n- Paper: https://arxiv.org/abs/2204.06577\n- Code: https://github.com/dschinagl/occam\n\n**DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**\n\n- Homepage: https://thudair.baai.ac.cn/index\n- Paper: https://arxiv.org/abs/2204.05575\n- Code: https://github.com/AIR-THU/DAIR-V2X\n\n**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**\n\n- Homepage: https://ithaca365.mae.cornell.edu/\n\n- Paper: https://arxiv.org/abs/2208.01166\n\n<a name=\"3DSS\"></a>\n\n# 3D语义分割(3D Semantic Segmentation)\n\n**Scribble-Supervised LiDAR Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.08537\n- Dataset: https://github.com/ouenal/scribblekitti\n\n**Stratified Transformer for 3D Point Cloud Segmentation**\n\n- Paper: https://arxiv.org/pdf/2203.14508.pdf\n- Code: https://github.com/dvlab-research/Stratified-Transformer\n\n# 3D实例分割(3D Instance Segmentation)\n\n**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**\n\n- Homepage: https://ithaca365.mae.cornell.edu/\n\n- Paper: https://arxiv.org/abs/2208.01166\n\n<a name=\"3D-Object-Tracking\"></a>\n\n# 3D目标跟踪(3D Object Tracking)\n\n**Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds**\n\n- Paper: https://arxiv.org/abs/2203.01730\n- Code: https://github.com/Ghostish/Open3DSOT\n\n**PTTR: Relational 3D Point Cloud Object Tracking with Transformer**\n\n- Paper: https://arxiv.org/abs/2112.02857\n- Code: https://github.com/Jasonkks/PTTR \n\n<a name=\"3D-Human-Pose-Estimation\"></a>\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n**MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2111.12707\n\n- Code: https://github.com/Vegetebird/MHFormer\n\n- 中文解读: https://zhuanlan.zhihu.com/p/439459426\n\n**MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**\n\n- Paper: https://arxiv.org/abs/2203.00859\n- Code: None\n\n**Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2203.07697\n- Code: None\n- 中文解读：https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw\n\n**BEV: Putting People in their Place: Monocular Regression of 3D People in Depth**\n\n- Homepage: https://arthur151.github.io/BEV/BEV.html\n- Paper: https://arxiv.org/abs/2112.08274\n- Code: https://github.com/Arthur151/ROMP\n- Dataset: https://github.com/Arthur151/Relative_Human\n- Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI\n\n<a name=\"3DSSC\"></a>\n\n# 3D语义场景补全(3D Semantic Scene Completion)\n\n**MonoScene: Monocular 3D Semantic Scene Completion**\n\n- Paper: https://arxiv.org/abs/2112.00726\n- Code: https://github.com/cv-rits/MonoScene\n\n<a name=\"3D-R\"></a>\n\n# 3D重建(3D Reconstruction)\n\n**BANMo: Building Animatable 3D Neural Models from Many Casual Videos**\n\n- Homepage: https://banmo-www.github.io/\n- Paper: https://arxiv.org/abs/2112.12761\n- Code: https://github.com/facebookresearch/banmo\n- 中文解读：https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew\n\n<a name=\"ReID\"></a>\n\n# 行人重识别(Person Re-identification)\n\n**NFormer: Robust Person Re-identification with Neighbor Transformer**\n\n- Paper: https://arxiv.org/abs/2204.09331\n- Code: https://github.com/haochenheheda/NFormer\n\n<a name=\"COD\"></a>\n\n# 伪装物体检测(Camouflaged Object Detection)\n\n**Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection**\n\n- Paper: https://arxiv.org/abs/2203.02688\n- Code: https://github.com/lartpang/ZoomNet\n\n<a name=\"Depth-Estimation\"></a>\n\n# 深度估计(Depth Estimation)\n\n## 单目深度估计\n\n**NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation**\n\n- Paper: https://arxiv.org/abs/2203.01502\n- Code: None\n\n**OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion**\n\n- Paper: https://arxiv.org/abs/2203.00838\n- Code: None\n\n**Toward Practical Self-Supervised Monocular Indoor Depth Estimation**\n\n- Paper: https://arxiv.org/abs/2112.02306\n- Code: None\n\n**P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior**\n\n- Paper: https://arxiv.org/abs/2204.02091\n- Code: https://github.com/SysCV/P3Depth\n\n**Multi-Frame Self-Supervised Depth with Transformers**\n\n- Homepage: https://sites.google.com/tri.global/depthformer\n\n- Paper: https://arxiv.org/abs/2204.07616\n- Code: None\n\n<a name=\"Stereo-Matching\"></a>\n\n# 立体匹配(Stereo Matching)\n\n**ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching**\n\n- Paper: https://arxiv.org/abs/2203.02146\n- Code: https://github.com/gangweiX/ACVNet\n\n<a name=\"FM\"></a>\n\n# 特征匹配(Feature Matching)\n\n**ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching**\n\n- Paper: https://arxiv.org/abs/2204.11700\n- Code: None\n\n<a name=\"Lane-Detection\"></a>\n\n# 车道线检测(Lane Detection)\n\n**Rethinking Efficient Lane Detection via Curve Modeling**\n\n- Paper: https://arxiv.org/abs/2203.02431\n- Code: https://github.com/voldemortX/pytorch-auto-drive\n- Demo：https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4\n\n**A Keypoint-based Global Association Network for Lane Detection**\n\n- Paper: https://arxiv.org/abs/2204.07335\n- Code: https://github.com/Wolfwjs/GANet\n\n<a name=\"Optical-Flow-Estimation\"></a>\n\n# 光流估计(Optical Flow Estimation)\n\n**Imposing Consistency for Optical Flow Estimation**\n\n- Paper: https://arxiv.org/abs/2204.07262\n- Code: None\n\n**Deep Equilibrium Optical Flow Estimation**\n\n- Paper: https://arxiv.org/abs/2204.08442\n- Code: https://github.com/locuslab/deq-flow\n\n**GMFlow: Learning Optical Flow via Global Matching**\n\n- Paper(Oral): https://arxiv.org/abs/2111.13680\n- Code: https://github.com/haofeixu/gmflow\n\n<a name=\"Image-Inpainting\"></a>\n\n# 图像修复(Image Inpainting)\n\n**Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding**\n\n- Paper: https://arxiv.org/abs/2203.00867\n\n- Code: https://github.com/DQiaole/ZITS_inpainting\n\n<a name=\"Image-Retrieval\"></a>\n\n# 图像检索(Image Retrieval)\n\n**Correlation Verification for Image Retrieval**\n\n- Paper(Oral): https://arxiv.org/abs/2204.01458\n- Code: https://github.com/sungonce/CVNet\n\n<a name=\"Face-Recognition\"></a>\n\n# 人脸识别(Face Recognition)\n\n**AdaFace: Quality Adaptive Margin for Face Recognition**\n\n- Paper(Oral): https://arxiv.org/abs/2204.00964 \n- Code: https://github.com/mk-minchul/AdaFace\n\n<a name=\"Crowd-Counting\"></a>\n\n# 人群计数(Crowd Counting)\n\n**Leveraging Self-Supervision for Cross-Domain Crowd Counting**\n\n- Paper: https://arxiv.org/abs/2103.16291\n- Code: None\n\n<a name=\"Medical-Image\"></a>\n\n# 医学图像(Medical Image)\n\n**BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation**\n\n- Paper: https://arxiv.org/abs/2203.02533\n- Code: None\n\n**Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification**\n\n- Paper: https://arxiv.org/abs/2111.12918\n- Code: https://github.com/FBLADL/ACPL\n\n**DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**\n\n- Paper: https://arxiv.org/abs/2204.10437\n\n- Code: https://github.com/JLiangLab/DiRA\n\n<a name=\"Video Generation\"></a>\n\n# 视频生成(Video Generation)\n\n**StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**\n\n- Homepage: https://universome.github.io/stylegan-v\n- Paper: https://arxiv.org/abs/2112.14683\n\n- Code: https://github.com/universome/stylegan-v\n\n- Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4\n\n<a name=\"Scene-Graph-Generation\"></a>\n\n# 场景图生成(Scene Graph Generation)\n\n **SGTR: End-to-end Scene Graph Generation with Transformer**\n\n- Paper: https://arxiv.org/abs/2112.12970\n- Code: None\n\n<a name=\"R-VOS\"></a>\n\n# 参考视频目标分割(Referring Video Object Segmentation)\n\n**Language as Queries for Referring Video Object Segmentation**\n\n- Paper: https://arxiv.org/abs/2201.00487\n- Code:  https://github.com/wjn922/ReferFormer\n\n**ReSTR: Convolution-free Referring Image Segmentation Using Transformers**\n\n- Paper: https://arxiv.org/abs/2203.16768\n- Code: None\n\n<a name=\"GR\"></a>\n\n# 步态识别(Gait Recognition)\n\n**Gait Recognition in the Wild with Dense 3D Representations and A Benchmark**\n\n- Homepage: https://gait3d.github.io/\n- Paper: https://arxiv.org/abs/2204.02569\n- Code: https://github.com/Gait3D/Gait3D-Benchmark\n\n<a name=\"ST\"></a>\n\n# 风格迁移(Style Transfer)\n\n**StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions**\n\n- Homepage: https://lukashoel.github.io/stylemesh/\n- Paper: https://arxiv.org/abs/2112.01530\n\n- Code: https://github.com/lukasHoel/stylemesh\n- Demo：https://www.youtube.com/watch?v=ZqgiTLcNcks\n\n<a name=\"AD\"></a>\n\n# 异常检测(Anomaly Detection)\n\n**UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**\n\n- Paper: https://arxiv.org/abs/2111.08644\n\n- Dataset: https://github.com/lilygeorgescu/UBnormal\n\n**Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection**\n\n- Paper(Oral): https://arxiv.org/abs/2111.09099\n- Code: https://github.com/ristea/sspcab\n\n对抗样本)<a name=\"AE\"></a>\n\n# 对抗样本(Adversarial Examples)\n\n**Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon**\n\n- Paper: https://arxiv.org/abs/2203.03818\n- Code: https://github.com/hncszyq/ShadowAttack\n\n**LAS-AT: Adversarial Training with Learnable Attack Strategy**\n\n- Paper(Oral): https://arxiv.org/abs/2203.06616\n- Code: https://github.com/jiaxiaojunQAQ/LAS-AT\n\n**Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection**\n\n- Paper: https://arxiv.org/abs/2112.04532\n- Code: https://github.com/joellliu/SegmentAndComplete\n\n<a name=\"WSOL\"></a>\n\n# 弱监督物体检测(Weakly Supervised Object Localization)\n\n**Weakly Supervised Object Localization as Domain Adaption**\n\n- Paper: https://arxiv.org/abs/2203.01714\n- Code: https://github.com/zh460045050/DA-WSOL_CVPR2022\n\n<a name=\"ROD\"></a>\n\n# 雷达目标检测(Radar Object Detection)\n\n**Exploiting Temporal Relations on Radar Perception for Autonomous Driving**\n\n- Paper: https://arxiv.org/abs/2204.01184\n- Code: None\n\n<a name=\"HSI\"></a>\n\n# 高光谱图像重建(Hyperspectral Image Reconstruction)\n\n**Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**\n\n- Paper: https://arxiv.org/abs/2111.07910\n- Code: https://github.com/caiyuanhao1998/MST\n\n<a name=\"Image-Stitching\"></a>\n\n# 图像拼接(Image Stitching)\n\n**Deep Rectangling for Image Stitching: A Learning Baseline**\n\n- Paper(Oral): https://arxiv.org/abs/2203.03831\n\n- Code: https://github.com/nie-lang/DeepRectangling\n- Dataset: https://github.com/nie-lang/DeepRectangling\n- 中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q\n\n<a name=\"Watermarking\"></a>\n\n# 水印(Watermarking)\n\n**Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings**\n\n- Paper: https://arxiv.org/abs/2104.13450\n- Code: None\n\n<a name=\"AC\"></a>\n\n# Action Counting\n\n**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**\n\n- Paper(Oral): https://arxiv.org/abs/2204.01018\n- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html\n- Code: https://github.com/SvipRepetitionCounting/TransRAC\n\n<a name=\"GSR\"></a>\n\n# Grounded Situation Recognition\n\n**Collaborative Transformers for Grounded Situation Recognition**\n\n- Paper: https://arxiv.org/abs/2203.16518\n- Code: https://github.com/jhcho99/CoFormer\n\n<a name=\"ZSL\"></a>\n\n# Zero-shot Learning\n\n**Unseen Classes at a Later Time? No Problem**\n\n- Paper: https://arxiv.org/abs/2203.16517\n- Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time\n\n<a name=\"DeepFakes\"></a>\n\n# DeepFakes\n\n**Detecting Deepfakes with Self-Blended Images**\n\n- Paper(Oral): https://arxiv.org/abs/2204.08376\n\n- Code: https://github.com/mapooon/SelfBlendedImages\n\n<a name=\"Datasets\"></a>\n\n# 数据集(Datasets)\n\n**It's About Time: Analog Clock Reading in the Wild**\n\n- Homepage: https://charigyang.github.io/abouttime/\n- Paper: https://arxiv.org/abs/2111.09162\n- Code: https://github.com/charigyang/itsabouttime\n- Demo: https://youtu.be/cbiMACA6dRc\n\n**Toward Practical Self-Supervised Monocular Indoor Depth Estimation**\n\n- Paper: https://arxiv.org/abs/2112.02306\n- Code: None\n\n**Kubric: A scalable dataset generator**\n\n- Paper: https://arxiv.org/abs/2203.03570\n- Code: https://github.com/google-research/kubric\n- 中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg\n\n**Scribble-Supervised LiDAR Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2203.08537\n- Dataset: https://github.com/ouenal/scribblekitti\n\n**Deep Rectangling for Image Stitching: A Learning Baseline**\n\n- Paper(Oral): https://arxiv.org/abs/2203.03831\n- Code: https://github.com/nie-lang/DeepRectangling\n- Dataset: https://github.com/nie-lang/DeepRectangling\n- 中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q\n\n**ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer**\n\n- Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/\n- Paper: https://arxiv.org/abs/2204.02389\n- Dataset: https://github.com/rhgao/ObjectFolder\n- Demo：https://youtu.be/e5aToT3LkRA\n\n**Shape from Polarization for Complex Scenes in the Wild**\n\n- Homepage: https://chenyanglei.github.io/sfpwild/index.html\n- Paper: https://arxiv.org/abs/2112.11377\n- Code: https://github.com/ChenyangLEI/sfp-wild\n\n**Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**\n\n- Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/\n- Paper: https://arxiv.org/abs/2204.04120\n\n**TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**\n\n- Paper(Oral): https://arxiv.org/abs/2204.01018\n- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html\n- Code: https://github.com/SvipRepetitionCounting/TransRAC\n\n**FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**\n\n- Paper(Oral): https://arxiv.org/abs/2204.03646\n- Dataset: https://github.com/xujinglin/FineDiving\n- Code: https://github.com/xujinglin/FineDiving\n- 中文解读：https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg\n\n**Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**\n\n- Paper: https://arxiv.org/abs/2204.02701\n- Dataset: https://github.com/yizhiwang96/TextLogoLayout\n- Code: https://github.com/yizhiwang96/TextLogoLayout\n\n**DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**\n\n- Homepage: https://thudair.baai.ac.cn/index\n- Paper: https://arxiv.org/abs/2204.05575\n- Code: https://github.com/AIR-THU/DAIR-V2X\n\n**A New Dataset and Transformer for Stereoscopic Video Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2204.10039\n- Code: https://github.com/H-deep/Trans-SVSR/\n- Dataset: http://shorturl.at/mpwGX\n\n**Putting People in their Place: Monocular Regression of 3D People in Depth**\n\n- Homepage: https://arthur151.github.io/BEV/BEV.html\n- Paper: https://arxiv.org/abs/2112.08274\n\n- Code:https://github.com/Arthur151/ROMP\n- Dataset: https://github.com/Arthur151/Relative_Human\n\n**UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**\n\n- Paper: https://arxiv.org/abs/2111.08644\n- Dataset: https://github.com/lilygeorgescu/UBnormal\n\n**DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**\n\n- Homepage: https://dancetrack.github.io\n- Paper: https://arxiv.org/abs/2111.14690\n- Dataset: https://github.com/DanceTrack/DanceTrack\n\n**Visual Abductive Reasoning**\n\n- Paper: https://arxiv.org/abs/2203.14040\n- Code: https://github.com/leonnnop/VAR\n\n**Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**\n\n- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf\n- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset\n- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset\n\n**Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**\n\n- Homepage: https://ithaca365.mae.cornell.edu/\n\n- Paper: https://arxiv.org/abs/2208.01166\n\n<a name=\"New-Tasks\"></a>\n\n# 新任务(New Task)\n\n**Language-based Video Editing via Multi-Modal Multi-Level Transformer**\n\n- Paper: https://arxiv.org/abs/2104.01122\n- Code: None\n\n**It's About Time: Analog Clock Reading in the Wild**\n\n- Homepage: https://charigyang.github.io/abouttime/\n- Paper: https://arxiv.org/abs/2111.09162\n- Code: https://github.com/charigyang/itsabouttime\n- Demo: https://youtu.be/cbiMACA6dRc\n\n**Splicing ViT Features for Semantic Appearance Transfer**\n\n- Homepage: https://splice-vit.github.io/\n- Paper: https://arxiv.org/abs/2201.00424\n- Code: https://github.com/omerbt/Splice\n\n**Visual Abductive Reasoning**\n\n- Paper: https://arxiv.org/abs/2203.14040\n- Code: https://github.com/leonnnop/VAR\n\n<a name=\"Others\"></a>\n\n# 其他(Others)\n\n**Kubric: A scalable dataset generator**\n\n- Paper: https://arxiv.org/abs/2203.03570\n- Code: https://github.com/google-research/kubric\n- 中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg\n\n**X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**\n\n- Paper: https://arxiv.org/abs/2203.00843\n- Code: https://github.com/CurryYuan/X-Trans2Cap\n\n**Balanced MSE for Imbalanced Visual Regression**\n\n- Paper(Oral): https://arxiv.org/abs/2203.16427\n- Code: https://github.com/jiawei-ren/BalancedMSE\n\n**SNUG: Self-Supervised Neural Dynamic Garments**\n\n- Homepage: http://mslab.es/projects/SNUG/\n- Paper(Oral): https://arxiv.org/abs/2204.02219\n- Code: https://github.com/isantesteban/snug\n\n**Shape from Polarization for Complex Scenes in the Wild**\n\n- Homepage: https://chenyanglei.github.io/sfpwild/index.html\n- Paper: https://arxiv.org/abs/2112.11377\n- Code: https://github.com/ChenyangLEI/sfp-wild\n\n**LASER: LAtent SpacE Rendering for 2D Visual Localization**\n\n- Paper(Oral): https://arxiv.org/abs/2204.00157\n- Code: None\n\n**Single-Photon Structured Light**\n\n- Paper(Oral): https://arxiv.org/abs/2204.05300\n- Code: None\n\n**3DeformRS: Certifying Spatial Deformations on Point Clouds**\n\n- Paper: https://arxiv.org/abs/2204.05687\n- Code: None\n\n**Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**\n\n- Paper: https://arxiv.org/abs/2204.02701\n- Dataset: https://github.com/yizhiwang96/TextLogoLayout\n- Code: https://github.com/yizhiwang96/TextLogoLayout\n\n**Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes**\n\n- Paper: https://arxiv.org/abs/2203.13412\n- Code: https://github.com/zjsong/SSPL\n\n**Robust and Accurate Superquadric Recovery: a Probabilistic Approach**\n\n- Paper(Oral): https://arxiv.org/abs/2111.14517\n- Code: https://github.com/bmlklwx/EMS-superquadric_fitting\n\n**Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence**\n\n- Paper: https://arxiv.org/abs/2203.00911\n- Code: None\n\n**Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**\n\n- Paper(Oral): https://arxiv.org/abs/2204.08680\n- Code: https://github.com/zengwang430521/TCFormer\n\n**DeepDPM: Deep Clustering With an Unknown Number of Clusters**\n\n- Paper: https://arxiv.org/abs/2203.14309\n- Code: https://github.com/BGU-CS-VIL/DeepDPM\n\n**ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic**\n\n- Paper: https://arxiv.org/abs/2111.14447\n- Code: https://github.com/YoadTew/zero-shot-image-to-text\n\n**Proto2Proto: Can you recognize the car, the way I do?**\n\n- Paper: https://arxiv.org/abs/2204.11830\n- Code: https://github.com/archmaester/proto2proto\n\n**Putting People in their Place: Monocular Regression of 3D People in Depth**\n\n- Homepage: https://arthur151.github.io/BEV/BEV.html\n- Paper: https://arxiv.org/abs/2112.08274\n- Code:https://github.com/Arthur151/ROMP\n- Dataset: https://github.com/Arthur151/Relative_Human\n\n**Light Field Neural Rendering**\n\n- Homepage: https://light-field-neural-rendering.github.io/\n- Paper(Oral): https://arxiv.org/abs/2112.09687\n- Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering\n\n**Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2204.06160\n- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution\n\n**Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning**\n\n- Paper: https://arxiv.org/abs/2203.14333\n- Code: https://github.com/0liliulei/LIIR  "
  },
  {
    "path": "CVPR2023-Papers-with-Code.md",
    "content": "# CVPR 2023 论文和开源项目合集(Papers with Code)\n\n[CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)！\n\n**25.78% = 2360 / 9155**\n\nCVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate.\n\n\n> 注1：欢迎各位大佬提交issue，分享CVPR 2023论文和开源项目！\n>\n> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n>\n> - [CVPR 2019](CVPR2019-Papers-with-Code.md)\n> - [CVPR 2020](CVPR2020-Papers-with-Code.md)\n> - [CVPR 2021](CVPR2021-Papers-with-Code.md)\n> - [CVPR 2022](CVPR2022-Papers-with-Code.md)\n\n如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ \n\n![](CVer学术交流群.png)\n\n# 【CVPR 2023 论文开源目录】\n\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)\n- [MAE](#MAE)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [MLP](#MLP)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [DETR](#DETR)\n- [Prompt](#Prompt)\n- [Diffusion Models(扩散模型)](#Diffusion)\n- [Avatars](#Avatars)\n- [ReID(重识别)](#ReID)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Vision Transformer](#Vision-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [目标检测(Object Detection)](#Object-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像分割(Medical Image Segmentation)](#MIS)\n- [视频目标分割(Video Object Segmentation)](#VOS)\n- [视频实例分割(Video Instance Segmentation)](#VIS)\n- [参考图像分割(Referring Image Segmentation)](#RIS)\n- [图像抠图(Image Matting)](#Matting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#SR)\n- [去噪(Denoising)](#Denoising)\n- [去模糊(Deblur)](#Deblur)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3DOD)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D配准(3D Registration)](#3D-Registration)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)\n- [医学图像(Medical Image)](#Medical-Image)\n- [图像生成(Image Generation)](#Image-Generation)\n- [视频生成(Video Generation)](#Video-Generation)\n- [视频理解(Video Understanding)](#Video-Understanding)\n- [行为检测(Action Detection)](#Action-Detection)\n- [文本检测(Text Detection)](#Text-Detection)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [模型剪枝(Model Pruning)](#Pruning)\n- [图像压缩(Image Compression)](#IC)\n- [异常检测(Anomaly Detection)](#AD)\n- [三维重建(3D Reconstruction)](#3D-Reconstruction)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [轨迹预测(Trajectory Prediction)](#TP)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [视觉问答(Visual Question Answering)](#VQA)\n- [手语识别(Sign Language Recognition)](#SLR)\n- [视频预测(Video Prediction)](#Video-Prediction)\n- [新视点合成(Novel View Synthesis)](#NVS)\n- [Zero-Shot Learning(零样本学习)](#ZSL)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#Feature-Matching)\n- [场景图生成(Scene Graph Generation)](#SGG)\n- [隐式神经表示(Implicit Neural Representations)](#INR)\n- [图像质量评价(Image Quality Assessment)](#IQA)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n<a name=\"Backbone\"></a>\n\n# Backbone\n\n**Integrally Pre-Trained Transformer Pyramid Networks** \n\n- Paper: https://arxiv.org/abs/2211.12735\n- Code: https://github.com/sunsmarterjie/iTPN\n\n**Stitchable Neural Networks**\n\n- Homepage: https://snnet.github.io/\n- Paper: https://arxiv.org/abs/2302.06586\n- Code: https://github.com/ziplab/SN-Net\n\n**Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**\n\n- Paper: https://arxiv.org/abs/2303.03667\n- Code: https://github.com/JierunChen/FasterNet \n\n**BiFormer: Vision Transformer with Bi-Level Routing Attention**\n\n- Paper: None\n- Code: https://github.com/rayleizhu/BiFormer \n\n**DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network**\n\n- Paper: https://arxiv.org/abs/2303.02165\n- Code: https://github.com/alibaba/lightweight-neural-architecture-search \n\n**Vision Transformer with Super Token Sampling**\n\n- Paper: https://arxiv.org/abs/2211.11167\n- Code: https://github.com/hhb072/SViT\n\n**Hard Patches Mining for Masked Image Modeling**\n\n- Paper: None\n- Code: None\n\n**SMPConv: Self-moving Point Representations for Continuous Convolution**\n\n- Paper: https://arxiv.org/abs/2304.02330\n- Code: https://github.com/sangnekim/SMPConv\n\n**Making Vision Transformers Efficient from A Token Sparsification View**\n\n- Paper: https://arxiv.org/abs/2303.08685\n- Code: https://github.com/changsn/STViT-R \n\n<a name=\"CLIP\"></a>\n\n# CLIP\n\n**GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2301.12959\n- Code: https://github.com/tobran/GALIP\n\n**DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**\n\n- Paper: https://arxiv.org/abs/2303.06285\n- Code: https://github.com/Yueming6568/DeltaEdit \n\n<a name=\"MAE\"></a>\n\n# MAE\n\n**Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders** \n\n- Paper: https://arxiv.org/abs/2212.06785\n- Code: https://github.com/ZrrSkywalker/I2P-MAE\n\n**Generic-to-Specific Distillation of Masked Autoencoders**\n\n- Paper: https://arxiv.org/abs/2302.14771\n- Code: https://github.com/pengzhiliang/G2SD\n\n<a name=\"GAN\"></a>\n\n# GAN\n\n**DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**\n\n- Paper: https://arxiv.org/abs/2303.06285\n- Code: https://github.com/Yueming6568/DeltaEdit \n\n<a name=\"NeRF\"></a>\n\n# NeRF\n\n**NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior**\n\n- Home: https://nope-nerf.active.vision/\n- Paper: https://arxiv.org/abs/2212.07388\n- Code: None\n\n**Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures**\n\n- Paper: https://arxiv.org/abs/2211.07600\n- Code: https://github.com/eladrich/latent-nerf\n\n**NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis**\n\n- Paper: https://arxiv.org/abs/2301.08556\n- Code: None\n\n**Panoptic Lifting for 3D Scene Understanding with Neural Fields**\n\n- Homepage: https://nihalsid.github.io/panoptic-lifting/\n- Paper: https://arxiv.org/abs/2212.09802\n- Code: None\n\n**NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer**\n\n- Homepage: https://redrock303.github.io/nerflix/\n- Paper: https://arxiv.org/abs/2303.06919 \n- Code: None\n\n**HNeRV: A Hybrid Neural Representation for Videos**\n\n- Homepage: https://haochen-rye.github.io/HNeRV\n- Paper: https://arxiv.org/abs/2304.02633\n- Code: https://github.com/haochen-rye/HNeRV\n\n<a name=\"DETR\"></a>\n\n# DETR\n\n**DETRs with Hybrid Matching**\n\n- Paper: https://arxiv.org/abs/2207.13080\n- Code: https://github.com/HDETR\n\n<a name=\"Prompt\"></a>\n\n# Prompt\n\n**Diversity-Aware Meta Visual Prompting**\n\n- Paper: https://arxiv.org/abs/2303.08138\n- Code: https://github.com/shikiw/DAM-VP \n\n<a name=\"NAS\"></a>\n\n# NAS\n\n**PA&DA: Jointly Sampling PAth and DAta for Consistent NAS**\n\n- Paper: https://arxiv.org/abs/2302.14772\n- Code: https://github.com/ShunLu91/PA-DA\n\n<a name=\"Avatars\"></a>\n\n# Avatars\n\n**Structured 3D Features for Reconstructing Relightable and Animatable Avatars**\n\n- Homepage: https://enriccorona.github.io/s3f/\n- Paper: https://arxiv.org/abs/2212.06820\n- Code: None\n- Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s\n\n**Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos**\n\n- Homepage: https://augmentedperception.github.io/monoavatar/\n- Paper: https://arxiv.org/abs/2304.01436\n\n<a name=\"ReID\"></a>\n\n# ReID(重识别)\n\n**Clothing-Change Feature Augmentation for Person Re-Identification**\n\n- Paper: None\n- Code: None\n\n**MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID**\n\n- Paper: https://arxiv.org/abs/2303.07065\n- Code: https://github.com/vimar-gu/MSINet\n\n**Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification**\n\n- Paper: https://arxiv.org/abs/2304.04205\n- Code: None\n\n**Large-scale Training Data Search for Object Re-identification**\n\n- Paper: https://arxiv.org/abs/2303.16186\n- Code: https://github.com/yorkeyao/SnP \n\n<a name=\"Diffusion\"></a>\n\n# Diffusion Models(扩散模型)\n\n**Video Probabilistic Diffusion Models in Projected Latent Space** \n\n- Homepage: https://sihyun.me/PVDM/\n- Paper: https://arxiv.org/abs/2302.07685\n- Code: https://github.com/sihyun-yu/PVDM\n\n**Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2211.10655\n- Code: None\n\n**Imagic: Text-Based Real Image Editing with Diffusion Models**\n\n- Homepage: https://imagic-editing.github.io/\n- Paper: https://arxiv.org/abs/2210.09276\n- Code: None\n\n**Parallel Diffusion Models of Operator and Image for Blind Inverse Problems**\n\n- Paper: https://arxiv.org/abs/2211.10656\n- Code: None\n\n**DiffRF: Rendering-guided 3D Radiance Field Diffusion**\n\n- Homepage: https://sirwyver.github.io/DiffRF/\n- Paper: https://arxiv.org/abs/2212.01206\n- Code: None\n\n**MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**\n\n- Paper: https://arxiv.org/abs/2212.09478\n- Code: https://github.com/researchmm/MM-Diffusion\n\n**HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising**\n\n- Homepage: https://aminshabani.github.io/housediffusion/\n- Paper: https://arxiv.org/abs/2211.13287\n- Code: https://github.com/aminshabani/house_diffusion \n\n**TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets**\n\n- Paper: https://arxiv.org/abs/2303.05762\n- Code: https://github.com/chenweixin107/TrojDiff\n\n**Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption**\n\n- Paper: https://arxiv.org/abs/2207.03442\n- Code: https://github.com/shiyegao/DDA \n\n**DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration**\n\n- Paper: https://arxiv.org/abs/2303.06885\n- Code: None\n\n**Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**\n\n- Homepage: https://nv-tlabs.github.io/trace-pace/\n- Paper: https://arxiv.org/abs/2304.01893\n- Code: None\n\n**Generative Diffusion Prior for Unified Image Restoration and Enhancement**\n\n- Paper: https://arxiv.org/abs/2304.01247\n- Code: None\n\n**Conditional Image-to-Video Generation with Latent Flow Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2303.13744\n- Code: https://github.com/nihaomiao/CVPR23_LFDM \n\n<a name=\"Long-Tail\"></a>\n\n# 长尾分布(Long-Tail)\n\n**Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation**\n\n- Paper: https://arxiv.org/abs/2304.01279\n- Code: None\n\n<a name=\"Vision-Transformer\"></a>\n\n# Vision Transformer\n\n**Integrally Pre-Trained Transformer Pyramid Networks** \n\n- Paper: https://arxiv.org/abs/2211.12735\n- Code: https://github.com/sunsmarterjie/iTPN\n\n**Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors**\n\n- Homepage: https://niessnerlab.org/projects/hou2023mask3d.html\n- Paper: https://arxiv.org/abs/2302.14746\n- Code: None\n\n**Learning Trajectory-Aware Transformer for Video Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2204.04216\n- Code: https://github.com/researchmm/TTVSR\n\n**Vision Transformers are Parameter-Efficient Audio-Visual Learners**\n\n- Homepage: https://yanbo.ml/project_page/LAVISH/\n- Code: https://github.com/GenjiB/LAVISH\n\n**Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**\n\n- Paper: https://arxiv.org/abs/2303.04249\n- Code: None\n\n**DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**\n\n- Paper: https://arxiv.org/abs/2301.06051\n- Code: https://github.com/Haiyang-W/DSVT\n\n**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**\n\n- Paper: https://arxiv.org/abs/2211.10772\n- Code link: https://github.com/ViTAE-Transformer/DeepSolo\n\n**BiFormer: Vision Transformer with Bi-Level Routing Attention**\n\n- Paper: https://arxiv.org/abs/2303.08810\n- Code: https://github.com/rayleizhu/BiFormer\n\n**Vision Transformer with Super Token Sampling**\n\n- Paper: https://arxiv.org/abs/2211.11167\n- Code: https://github.com/hhb072/SViT\n\n**BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision**\n\n- Paper: https://arxiv.org/abs/2211.10439\n- Code: None\n\n**BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation**\n\n- Paper: None\n- Code: None\n\n**Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention**\n\n- Paper: https://arxiv.org/abs/2304.03282\n- Code: None\n\n**Making Vision Transformers Efficient from A Token Sparsification View**\n\n- Paper: https://arxiv.org/abs/2303.08685\n- Code: https://github.com/changsn/STViT-R \n\n<a name=\"VL\"></a>\n\n# 视觉和语言(Vision-Language)\n\n**GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods**\n\n- Paper: https://arxiv.org/abs/2301.01893\n- Code: None\n\n**Teaching Structured Vision&Language Concepts to Vision&Language Models**\n\n- Paper: https://arxiv.org/abs/2211.11733\n- Code: None\n\n**Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks**\n\n- Paper: https://arxiv.org/abs/2211.09808\n- Code: https://github.com/fundamentalvision/Uni-Perceiver\n\n**Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training**\n\n- Paper: https://arxiv.org/abs/2303.00040\n- Code: None\n\n**CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**\n\n- Paper: https://arxiv.org/abs/2303.02489\n- Code: None\n\n**FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**\n\n- Paper: https://arxiv.org/abs/2303.02483\n- Code: None\n\n**Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding**\n\n- Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html\n- Paper: https://arxiv.org/abs/2303.04077\n- Code: None\n\n**All in One: Exploring Unified Video-Language Pre-training**\n\n- Paper: https://arxiv.org/abs/2203.07303\n- Code: https://github.com/showlab/all-in-one\n\n**Position-guided Text Prompt for Vision Language Pre-training**\n\n- Paper: https://arxiv.org/abs/2212.09737\n- Code: https://github.com/sail-sg/ptp\n\n**EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding**\n\n- Paper: https://arxiv.org/abs/2209.14941\n- Code: https://github.com/yanmin-wu/EDA\n\n**CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**\n\n- Paper: https://arxiv.org/abs/2303.02489\n- Code: None\n\n**FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**\n\n- Paper: https://arxiv.org/abs/2303.02483\n- Code: https://github.com/BrandonHanx/FAME-ViL\n\n**Align and Attend: Multimodal Summarization with Dual Contrastive Losses**\n\n- Homepage: https://boheumd.github.io/A2Summ/\n- Paper: https://arxiv.org/abs/2303.07284\n- Code: https://github.com/boheumd/A2Summ\n\n**Multi-Modal Representation Learning with Text-Driven Soft Masks**\n\n- Paper: https://arxiv.org/abs/2304.00719\n- Code: None\n\n**Learning to Name Classes for Vision and Language Models**\n\n- Paper: https://arxiv.org/abs/2304.01830\n- Code: None\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测(Object Detection)\n\n**YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors**\n\n- Paper: https://arxiv.org/abs/2207.02696\n- Code: https://github.com/WongKinYiu/yolov7\n\n**DETRs with Hybrid Matching**\n\n- Paper: https://arxiv.org/abs/2207.13080\n- Code: https://github.com/HDETR\n\n**Enhanced Training of Query-Based Object Detection via Selective Query Recollection**\n\n- Paper: https://arxiv.org/abs/2212.07593\n- Code: https://github.com/Fangyi-Chen/SQR\n\n**Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection**\n\n- Paper: https://arxiv.org/abs/2303.05892\n- Code: https://github.com/LutingWang/OADP\n\n<a name=\"VT\"></a>\n\n# 目标跟踪(Object Tracking)\n\n**Simple Cues Lead to a Strong Multi-Object Tracker**\n\n- Paper: https://arxiv.org/abs/2206.04656\n- Code: None\n\n**Joint Visual Grounding and Tracking with Natural Language Specification**\n\n- Paper: https://arxiv.org/abs/2303.12027\n- Code: https://github.com/lizhou-cs/JointNLT \n\n<a name=\"Semantic-Segmentation\"></a>\n\n# 语义分割(Semantic Segmentation)\n\n**Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos**\n\n- Paper: https://arxiv.org/abs/2303.07224\n- Code: https://github.com/THU-LYJ-Lab/AR-Seg\n\n**FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding**\n\n- Paper: https://arxiv.org/abs/2304.02135\n- Code: https://github.com/uark-cviu/FREDOM\n\n<a name=\"MIS\"></a>\n\n# 医学图像分割(Medical Image Segmentation)\n\n**Label-Free Liver Tumor Segmentation**\n\n- Paper: https://arxiv.org/abs/2303.14869\n- Code: https://github.com/MrGiovanni/SyntheticTumors\n\n**Directional Connectivity-based Segmentation of Medical Images**\n\n- Paper: https://arxiv.org/abs/2304.00145\n- Code: https://github.com/Zyun-Y/DconnNet\n\n**Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation**\n\n- Paper: https://arxiv.org/abs/2305.00673\n- Code: https://github.com/DeepMed-Lab-ECNU/BCP\n\n**Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization**\n\n- Paper: https://arxiv.org/abs/2304.00212\n- Code: None\n\n**Fair Federated Medical Image Segmentation via Client Contribution Estimation**\n\n- Paper: https://arxiv.org/abs/2303.16520\n- Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce\n\n**Ambiguous Medical Image Segmentation using Diffusion Models**\n\n- Homepage: https://aimansnigdha.github.io/cimd/\n- Paper: https://arxiv.org/abs/2304.04745\n- Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models\n\n**Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation**\n\n- Paper: https://arxiv.org/abs/2303.13090\n- Code: https://github.com/HengCai-NJU/DeSCO\n\n**MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery**\n\n- Paper: https://arxiv.org/abs/2301.01767\n- Code: https://github.com/DeepMed-Lab-ECNU/MagicNet\n\n**MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html\n- Code: https://github.com/WYC-321/MCF\n\n**Rethinking Few-Shot Medical Segmentation: A Vector Quantization View**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html\n- Code: None\n\n**Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation**\n\n- Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html\n- Code: https://github.com/hritam-98/PatchCL-MedSeg\n\n**SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation**\n\n- Paper: https://arxiv.org/abs/2305.11012\n- Code: None\n\n**DoNet: Deep De-overlapping Network for Cytology Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2303.14373\n- Code: https://github.com/DeepDoNet/DoNet\n\n<a name=\"VOS\"></a>\n\n# 视频目标分割（Video Object Segmentation）\n\n**Two-shot Video Object Segmentation**\n\n- Paper: https://arxiv.org/abs/2303.12078\n- Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation\n\n **Under Video Object Segmentation Section**\n\n- Paper: https://arxiv.org/abs/2303.07815\n- Code: None\n\n<a name=\"VIS\"></a>\n\n# 视频实例分割(Video Instance Segmentation)\n\n**Mask-Free Video Instance Segmentation**\n\n- Paper: https://arxiv.org/abs/2303.15904\n- Code: https://github.com/SysCV/MaskFreeVis \n\n<a name=\"RIS\"></a>\n\n# 参考图像分割(Referring Image Segmentation )\n\n**PolyFormer: Referring Image Segmentation as Sequential Polygon Generation**\n\n- Paper: https://arxiv.org/abs/2302.07387 \n\n- Code: None\n\n<a name=\"3D-Point-Cloud\"></a>\n\n# 3D点云(3D-Point-Cloud)\n\n**Physical-World Optical Adversarial Attacks on 3D Face Recognition**\n\n- Paper: https://arxiv.org/abs/2205.13412\n- Code: https://github.com/PolyLiYJ/SLAttack.git\n\n**IterativePFN: True Iterative Point Cloud Filtering**\n\n- Paper: https://arxiv.org/abs/2304.01529\n- Code: https://github.com/ddsediri/IterativePFN\n\n**Attention-based Point Cloud Edge Sampling**\n\n- Homepage: https://junweizheng93.github.io/publications/APES/APES.html \n- Paper: https://arxiv.org/abs/2302.14673\n- Code: https://github.com/JunweiZheng93/APES\n\n<a name=\"3DOD\"></a>\n\n# 3D目标检测(3D Object Detection)\n\n**DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**\n\n- Paper: https://arxiv.org/abs/2301.06051\n- Code: https://github.com/Haiyang-W/DSVT \n\n**FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection**\n\n- Paper:  https://arxiv.org/abs/2301.04467\n- Code: None\n\n**3D Video Object Detection with Learnable Object-Centric Global Optimization**\n\n- Paper: None\n- Code: None\n\n**Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection**\n\n- Paper: https://arxiv.org/abs/2304.01464\n- Code: https://github.com/azhuantou/HSSDA\n\n<a name=\"3DOD\"></a>\n\n# 3D语义分割(3D Semantic Segmentation)\n\n**Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2303.11203\n- Code: https://github.com/l1997i/lim3d \n\n<a name=\"3DSSC\"></a>\n\n# 3D语义场景补全(3D Semantic Scene Completion)\n\n- Paper: https://arxiv.org/abs/2302.12251\n- Code: https://github.com/NVlabs/VoxFormer \n\n<a name=\"3D-Registration\"></a>\n\n# 3D配准(3D Registration)\n\n**Robust Outlier Rejection for 3D Registration with Variational Bayes**\n\n- Paper: https://arxiv.org/abs/2304.01514\n- Code: https://github.com/Jiang-HB/VBReg\n\n<a name=\"3D-Human-Pose-Estimation\"></a>\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n<a name=\"3D-Human-Mesh-Estimation\"></a>\n\n# 3D人体Mesh估计(3D Human Mesh Estimation)\n\n**3D Human Mesh Estimation from Virtual Markers**\n\n- Paper: https://arxiv.org/abs/2303.11726\n- Code: https://github.com/ShirleyMaxx/VirtualMarker \n\n<a name=\"LLV\"></a>\n\n# Low-level Vision\n\n**Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective**\n\n- Paper: https://arxiv.org/abs/2303.06859\n- Code: https://github.com/lixinustc/Casual-IR-DIL \n\n**Burstormer: Burst Image Restoration and Enhancement Transformer**\n\n- Paper: https://arxiv.org/abs/2304.01194\n- Code: http://github.com/akshaydudhane16/Burstormer\n\n<a name=\"SR\"></a>\n\n# 超分辨率(Video Super-Resolution)\n\n**Super-Resolution Neural Operator**\n\n- Paper: https://arxiv.org/abs/2303.02584\n- Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator \n\n## 视频超分辨率\n\n**Learning Trajectory-Aware Transformer for Video Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2204.04216\n\n- Code: https://github.com/researchmm/TTVSR\n\nDenoising<a name=\"Denoising\"></a>\n\n# 去噪(Denoising)\n\n## 图像去噪(Image Denoising)\n\n**Masked Image Training for Generalizable Deep Image Denoising**\n\n- Paper- : https://arxiv.org/abs/2303.13132\n- Code: https://github.com/haoyuc/MaskedDenoising \n\n<a name=\"Image-Generation\"></a>\n\n# 图像生成(Image Generation)\n\n**GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2301.12959\n- Code: https://github.com/tobran/GALIP \n\n**MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2211.09117\n- Code: https://github.com/LTH14/mage\n\n**Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation**\n\n- Paper: https://arxiv.org/abs/2304.01816\n- Code: None\n\n**Few-shot Semantic Image Synthesis with Class Affinity Transfer**\n\n- Paper: https://arxiv.org/abs/2304.02321\n- Code: None\n\n**TopNet: Transformer-based Object Placement Network for Image Compositing**\n\n- Paper: https://arxiv.org/abs/2304.03372\n- Code: None\n\n<a name=\"Video-Generation\"></a>\n\n# 视频生成(Video Generation)\n\n**MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**\n\n- Paper: https://arxiv.org/abs/2212.09478\n- Code: https://github.com/researchmm/MM-Diffusion\n\n**Conditional Image-to-Video Generation with Latent Flow Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2303.13744\n- Code: https://github.com/nihaomiao/CVPR23_LFDM \n\n<a name=\"Video-Understanding\"></a>\n\n# 视频理解(Video Understanding)\n\n**Learning Transferable Spatiotemporal Representations from Natural Script Knowledge**\n\n- Paper: https://arxiv.org/abs/2209.15280\n- Code: https://github.com/TencentARC/TVTS\n\n**Frame Flexible Network**\n\n- Paper: https://arxiv.org/abs/2303.14817\n- Code: https://github.com/BeSpontaneous/FFN\n\n**Masked Motion Encoding for Self-Supervised Video Representation Learning**\n\n- Paper: https://arxiv.org/abs/2210.06096\n- Code: https://github.com/XinyuSun/MME\n\n**MARLIN: Masked Autoencoder for facial video Representation LearnING**\n\n- Paper: https://arxiv.org/abs/2211.06627\n- Code: https://github.com/ControlNet/MARLIN \n\n<a name=\"Action-Detection\"></a>\n\n# 行为检测(Action Detection)\n\n**TriDet: Temporal Action Detection with Relative Boundary Modeling**\n\n- Paper: https://arxiv.org/abs/2303.07347\n- Code: https://github.com/dingfengshi/TriDet \n\n<a name=\"Text-Detection\"></a>\n\n# 文本检测(Text Detection)\n\n**DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**\n\n- Paper: https://arxiv.org/abs/2211.10772\n- Code link: https://github.com/ViTAE-Transformer/DeepSolo\n\n<a name=\"KD\"></a>\n\n# 知识蒸馏(Knowledge Distillation)\n\n**Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation**\n\n- Paper: https://arxiv.org/abs/2302.14290\n- Code: None\n\n**Generic-to-Specific Distillation of Masked Autoencoders**\n\n- Paper: https://arxiv.org/abs/2302.14771\n- Code: https://github.com/pengzhiliang/G2SD\n\n<a name=\"Pruning\"></a>\n\n# 模型剪枝(Model Pruning)\n\n**DepGraph: Towards Any Structural Pruning**\n\n- Paper: https://arxiv.org/abs/2301.12900\n- Code: https://github.com/VainF/Torch-Pruning \n\n<a name=\"IC\"></a>\n\n# 图像压缩(Image Compression)\n\n**Context-Based Trit-Plane Coding for Progressive Image Compression**\n\n- Paper: https://arxiv.org/abs/2303.05715\n- Code: https://github.com/seungminjeon-github/CTC\n\n<a name=\"AD\"></a>\n\n# 异常检测(Anomaly Detection)\n\n**Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images**\n\n- Paper: https://arxiv.org/abs/2111.13495\n- Code: https://github.com/tiangexiang/SQUID \n\n<a name=\"3D-Reconstruction\"></a>\n\n# 三维重建(3D Reconstruction)\n\n**OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields**\n\n- Paper: https://arxiv.org/abs/2211.12886\n- Code: None\n\n**SparsePose: Sparse-View Camera Pose Regression and Refinement**\n\n- Paper: https://arxiv.org/abs/2211.16991\n- Code: None\n\n**NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction**\n\n- Paper: https://arxiv.org/abs/2303.02375\n- Code: None\n\n**Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition**\n\n- Homepage: https://moygcc.github.io/vid2avatar/\n- Paper: https://arxiv.org/abs/2302.11566\n- Code: https://github.com/MoyGcc/vid2avatar\n- Demo: https://youtu.be/EGi47YeIeGQ\n\n**To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**\n\n- Paper: https://arxiv.org/abs/2106.09614\n- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA\n\n**Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction**\n\n- Paper: https://arxiv.org/abs/2303.05937\n- Code: None\n\n**3D Cinemagraphy from a Single Image**\n\n- Homepage: https://xingyi-li.github.io/3d-cinemagraphy/\n- Paper: https://arxiv.org/abs/2303.05724\n- Code: https://github.com/xingyi-li/3d-cinemagraphy\n\n**Revisiting Rotation Averaging: Uncertainties and Robust Losses**\n\n- Paper: https://arxiv.org/abs/2303.05195\n- Code https://github.com/zhangganlin/GlobalSfMpy \n\n**FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction**\n\n- Paper: https://arxiv.org/abs/2211.13874\n- Code: https://github.com/csbhr/FFHQ-UV \n\n**A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images**\n\n- Homepage: https://younglbw.github.io/HRN-homepage/ \n\n- Paper: https://arxiv.org/abs/2302.14434\n- Code: https://github.com/youngLBW/HRN\n\n<a name=\"Depth-Estimation\"></a>\n\n# 深度估计(Depth Estimation)\n\n**Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation**\n\n- Paper: https://arxiv.org/abs/2211.13202\n- Code: https://github.com/noahzn/Lite-Mono \n\n<a name=\"TP\"></a>\n\n# 轨迹预测(Trajectory Prediction)\n\n**IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction**\n\n- Paper:  https://arxiv.org/abs/2303.00575\n- Code: None\n\n**EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning**\n\n- Paper: https://arxiv.org/abs/2303.10876\n- Code: https://github.com/MediaBrain-SJTU/EqMotion \n\n<a name=\"Lane-Detection\"></a>\n\n# 车道线检测(Lane Detection)\n\n**Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection**\n\n- Paper: https://arxiv.org/abs/2301.02371\n- Code: https://github.com/tusen-ai/Anchor3DLane\n\n**BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points**\n\n- Paper:  https://arxiv.org/abs/2210.06006v3 \n- Code:  https://github.com/gigo-team/bev_lane_det \n\n<a name=\"Image-Captioning\"></a>\n\n# 图像描述(Image Captioning)\n\n**ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing**\n\n- Paper: https://arxiv.org/abs/2303.02437\n- Code: Node\n\n**Cross-Domain Image Captioning with Discriminative Finetuning**\n\n- Paper: https://arxiv.org/abs/2304.01662\n- Code: None\n\n**Model-Agnostic Gender Debiased Image Captioning**\n\n- Paper: https://arxiv.org/abs/2304.03693\n- Code: None\n\n<a name=\"VQA\"></a>\n\n# 视觉问答(Visual Question Answering)\n\n**MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering**\n\n- Paper:  https://arxiv.org/abs/2303.01239\n- Code: https://github.com/jingjing12110/MixPHM\n\n<a name=\"SLR\"></a>\n\n# 手语识别(Sign Language Recognition)\n\n**Continuous Sign Language Recognition with Correlation Network**\n\nPaper: https://arxiv.org/abs/2303.03202\n\nCode: https://github.com/hulianyuyy/CorrNet\n\n<a name=\"Video-Prediction\"></a>\n\n# 视频预测(Video Prediction)\n\n**MOSO: Decomposing MOtion, Scene and Object for Video Prediction**\n\n- Paper: https://arxiv.org/abs/2303.03684\n- Code: https://github.com/anonymous202203/MOSO\n\n<a name=\"NVS\"></a>\n\n# 新视点合成(Novel View Synthesis)\n\n **3D Video Loops from Asynchronous Input**\n\n- Homepage: https://limacv.github.io/VideoLoop3D_web/\n- Paper: https://arxiv.org/abs/2303.05312\n- Code: https://github.com/limacv/VideoLoop3D \n\n<a name=\"ZSL\"></a>\n\n# Zero-Shot Learning(零样本学习)\n\n**Bi-directional Distribution Alignment for Transductive Zero-Shot Learning**\n\n- Paper: https://arxiv.org/abs/2303.08698\n- Code: https://github.com/Zhicaiwww/Bi-VAEGAN\n\n**Semantic Prompt for Few-Shot Learning**\n\n- Paper: None\n- Code: None\n\n<a name=\"Stereo-Matching\"></a>\n\n# 立体匹配(Stereo Matching)\n\n**Iterative Geometry Encoding Volume for Stereo Matching**\n\n- Paper: https://arxiv.org/abs/2303.06615\n- Code: https://github.com/gangweiX/IGEV\n\n**Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation**\n\n- Paper: https://arxiv.org/abs/2304.00152\n- Code: None\n\n<a name=\"Feature-Matching\"></a>\n\n# 特征匹配(Feature Matching)\n\n**Adaptive Spot-Guided Transformer for Consistent Local Feature Matching**\n\n- Homepage: [https://astr2023.github.io](https://astr2023.github.io/) \n- Paper: https://arxiv.org/abs/2303.16624\n- Code: https://github.com/ASTR2023/ASTR\n\n<a name=\"SGG\"></a>\n\n# 场景图生成(Scene Graph Generation)\n\n**Prototype-based Embedding Network for Scene Graph Generation**\n\n- Paper: https://arxiv.org/abs/2303.07096\n- Code: None\n\n<a name=\"INR\"></a>\n\n# 隐式神经表示(Implicit Neural Representations)\n\n**Polynomial Implicit Neural Representations For Large Diverse Datasets**\n\n- Paper: https://arxiv.org/abs/2303.11424\n- Code: https://github.com/Rajhans0/Poly_INR\n\n<a name=\"IQA\"></a>\n\n# 图像质量评价(Image Quality Assessment)\n\n**Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild**\n\n- Paper: https://arxiv.org/abs/2304.00451\n- Code: None\n\n<a name=\"Datasets\"></a>\n\n# 数据集(Datasets)\n\n**Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes**\n\n- Paper: https://arxiv.org/abs/2303.02760\n- Code: None\n\n**Align and Attend: Multimodal Summarization with Dual Contrastive Losses**\n\n- Homepage: https://boheumd.github.io/A2Summ/\n- Paper: https://arxiv.org/abs/2303.07284\n- Code: https://github.com/boheumd/A2Summ\n\n**GeoNet: Benchmarking Unsupervised Adaptation across Geographies**\n\n- Homepage: https://tarun005.github.io/GeoNet/\n- Paper: https://arxiv.org/abs/2303.15443\n\n**CelebV-Text: A Large-Scale Facial Text-Video Dataset**\n\n- Homepage: https://celebv-text.github.io/\n- Paper: https://arxiv.org/abs/2303.14717\n\n<a name=\"Others\"></a>\n\n# 其他(Others)\n\n**Interactive Segmentation as Gaussian Process Classification**\n\n- Paper: https://arxiv.org/abs/2302.14578\n- Code: None\n\n**Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger**\n\n- Paper: https://arxiv.org/abs/2302.14677\n- Code: None\n\n**SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries**\n\n- Homepage: http://bit.ly/splinecam\n- Paper: https://arxiv.org/abs/2302.12828\n- Code: None\n\n**SCOTCH and SODA: A Transformer Video Shadow Detection Framework**\n\n- Paper: https://arxiv.org/abs/2211.06885\n- Code: None\n\n**DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization**\n\n- Homepage: https://ai4ce.github.io/DeepMapping2/\n- Paper: https://arxiv.org/abs/2212.06331\n- None: https://github.com/ai4ce/DeepMapping2\n\n**RelightableHands: Efficient Neural Relighting of Articulated Hand Models**\n\n- Homepage: https://sh8.io/#/relightable_hands\n- Paper: https://arxiv.org/abs/2302.04866\n- Code: None\n\n**Token Turing Machines**\n\n- Paper: https://arxiv.org/abs/2211.09119\n- Code: None\n\n**Single Image Backdoor Inversion via Robust Smoothed Classifiers**\n\n- Paper: https://arxiv.org/abs/2303.00215\n- Code: https://github.com/locuslab/smoothinv\n\n**To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**\n\n- Paper: https://arxiv.org/abs/2106.09614\n- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA\n\n**HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics**\n\n- Homepage: https://dolorousrtur.github.io/hood/\n- Paper: https://arxiv.org/abs/2212.07242\n- Code: https://github.com/dolorousrtur/hood\n- Demo: https://www.youtube.com/watch?v=cBttMDPrUYY\n\n**A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others**\n\n- Paper: https://arxiv.org/abs/2212.04825\n- Code: https://github.com/facebookresearch/Whac-A-Mole.git\n\n**RelightableHands: Efficient Neural Relighting of Articulated Hand Models**\n\n- Homepage: https://sh8.io/#/relightable_hands\n- Paper: https://arxiv.org/abs/2302.04866\n- Code: None\n- Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4\n\n**Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation**\n\n- Paper: https://arxiv.org/abs/2303.00914\n- Code: None\n\n**Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression**\n\n- Paper: https://arxiv.org/abs/2303.01052\n- Code: None\n\n**UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy**\n\n- Paper: https://arxiv.org/abs/2303.00938\n- Code: None\n\n**Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness**\n\n- Paper: https://arxiv.org/abs/2303.00971\n- Code: https://github.com/zhijieshen-bjtu/DOPNet\n\n**Learning Neural Parametric Head Models**\n\n- Homepage: https://simongiebenhain.github.io/NPHM)\n- Paper: https://arxiv.org/abs/2212.02761\n- Code: None\n\n**A Meta-Learning Approach to Predicting Performance and Data Requirements**\n\n- Paper: https://arxiv.org/abs/2303.01598\n- Code: None\n\n**MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision**\n\n- Homepage: https://imagine.enpc.fr/~guedona/MACARONS/\n- Paper: https://arxiv.org/abs/2303.03315\n- Code: None\n\n**Masked Images Are Counterfactual Samples for Robust Fine-tuning**\n\n- Paper: https://arxiv.org/abs/2303.03052\n- Code: None\n\n**HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling**\n\n- Paper: https://arxiv.org/abs/2303.02700\n- Code: None\n\n**Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization**\n\n- Paper: https://arxiv.org/abs/2303.02328\n- Code: None\n\n**Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization**\n\n- Paper: https://arxiv.org/abs/2303.03108\n- Code: None\n\n**Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples**\n\n- Paper: https://arxiv.org/abs/2301.01217\n- Code: https://github.com/jiamingzhang94/Unlearnable-Clusters \n\n**Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**\n\n- Paper: https://arxiv.org/abs/2303.04249\n- Code: None\n\n**UniHCP: A Unified Model for Human-Centric Perceptions**\n\n- Paper: https://arxiv.org/abs/2303.02936\n- Code: https://github.com/OpenGVLab/UniHCP\n\n**CUDA: Convolution-based Unlearnable Datasets**\n\n- Paper: https://arxiv.org/abs/2303.04278\n- Code: https://github.com/vinusankars/Convolution-based-Unlearnability\n\n**Masked Images Are Counterfactual Samples for Robust Fine-tuning**\n\n- Paper: https://arxiv.org/abs/2303.03052\n- Code: None\n\n**AdaptiveMix: Robust Feature Representation via Shrinking Feature Space**\n\n- Paper: https://arxiv.org/abs/2303.01559\n- Code: https://github.com/WentianZhang-ML/AdaptiveMix \n\n**Physical-World Optical Adversarial Attacks on 3D Face Recognition**\n\n- Paper: https://arxiv.org/abs/2205.13412\n- Code: https://github.com/PolyLiYJ/SLAttack.git\n\n**DPE: Disentanglement of Pose and Expression for General Video Portrait Editing**\n\n- Paper: https://arxiv.org/abs/2301.06281\n- Code: https://carlyx.github.io/DPE/ \n\n**SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation**\n\n- Paper: https://arxiv.org/abs/2211.12194\n- Code: https://github.com/Winfredy/SadTalker\n\n**Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models**\n\n- Paper: None\n- Code: None\n\n**Sharpness-Aware Gradient Matching for Domain Generalization**\n\n- Paper: None\n- Code: https://github.com/Wang-pengfei/SAGM\n\n**Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization**\n\n- Paper: None\n- Code: None\n\n**Blind Video Deflickering by Neural Filtering with a Flawed Atlas**\n\n- Homepage:  https://chenyanglei.github.io/deflicker \n- Paper: None\n- Code: None\n\n**RiDDLE: Reversible and Diversified De-identification with Latent Encryptor**\n\n- Paper: None\n- Code:  https://github.com/ldz666666/RiDDLE \n\n**PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation**\n\n- Paper: https://arxiv.org/abs/2303.07337\n- Code: None\n\n**Upcycling Models under Domain and Category Shift**\n\n- Paper: https://arxiv.org/abs/2303.07110\n- Code: https://github.com/ispc-lab/GLC\n\n**Modality-Agnostic Debiasing for Single Domain Generalization**\n\n- Paper: https://arxiv.org/abs/2303.07123\n- Code: None\n\n**Progressive Open Space Expansion for Open-Set Model Attribution**\n\n- Paper: https://arxiv.org/abs/2303.06877\n- Code: None\n\n**Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies**\n\n- Paper: https://arxiv.org/abs/2303.06856\n- Code: None\n\n**GFPose: Learning 3D Human Pose Prior with Gradient Fields**\n\n- Paper: https://arxiv.org/abs/2212.08641\n- Code: https://github.com/Embracing/GFPose \n\n**PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment**\n\n- Paper: https://arxiv.org/abs/2303.11526\n- Code: https://github.com/Zhang-VISLab\n\n**Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings**\n\n- Paper: https://arxiv.org/abs/2303.11502\n- Code: None\n\n**Boundary Unlearning**\n\n- Paper: https://arxiv.org/abs/2303.11570\n- Code: None\n\n**ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing**\n\n- Paper: https://arxiv.org/abs/2303.17096\n- Code: https://github.com/alibaba/easyrobust\n\n**Zero-shot Model Diagnosis**\n\n- Paper: https://arxiv.org/abs/2303.15441\n- Code: None\n\n**GeoNet: Benchmarking Unsupervised Adaptation across Geographies**\n\n- Homepage: https://tarun005.github.io/GeoNet/\n- Paper: https://arxiv.org/abs/2303.15443\n\n**Quantum Multi-Model Fitting**\n\n- Paper: https://arxiv.org/abs/2303.15444\n- Code: https://github.com/FarinaMatteo/qmmf\n\n**DivClust: Controlling Diversity in Deep Clustering**\n\n- Paper: https://arxiv.org/abs/2304.01042\n- Code: None\n\n**Neural Volumetric Memory for Visual Locomotion Control**\n\n- Homepage: https://rchalyang.github.io/NVM\n- Paper: https://arxiv.org/abs/2304.01201\n- Code: https://rchalyang.github.io/NVM\n\n**MonoHuman: Animatable Human Neural Field from Monocular Video**\n\n- Homepage: https://yzmblog.github.io/projects/MonoHuman/\n- Paper: https://arxiv.org/abs/2304.02001\n- Code: https://github.com/Yzmblog/MonoHuman\n\n**Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**\n\n- Homepage: https://nv-tlabs.github.io/trace-pace/\n- Paper: https://arxiv.org/abs/2304.01893\n- Code: None\n\n**Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification**\n\n- Paper: https://arxiv.org/abs/2304.01804\n- Code: None\n\n**HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering**\n\n- Paper: https://arxiv.org/abs/2304.01686\n- Code: None\n\n**On the Stability-Plasticity Dilemma of Class-Incremental Learning**\n\n- Paper: https://arxiv.org/abs/2304.01663\n- Code: None\n\n**Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning**\n\n- Paper: https://arxiv.org/abs/2304.01482\n- Code: None\n\n**VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution**\n\n- Paper: https://arxiv.org/abs/2304.01434\n- Code: https://github.com/jaeill/CVPR23-VNE\n\n**Detecting and Grounding Multi-Modal Media Manipulation**\n\n- Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake\n- Paper: https://arxiv.org/abs/2304.02556\n- Code: https://github.com/rshaojimmy/MultiModal-DeepFake\n\n**Meta-causal Learning for Single Domain Generalization**\n\n- Paper: https://arxiv.org/abs/2304.03709\n- Code: None\n\n**Disentangling Writer and Character Styles for Handwriting Generation**\n\n- Paper: https://arxiv.org/abs/2303.14736\n- Code: https://github.com/dailenson/SDT\n\n**DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects**\n\n- Homepage: https://www.chenbao.tech/dexart/\n\n- Code: https://github.com/Kami-code/dexart-release\n\n**Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision**\n\n- Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html \n- Paper: https://arxiv.org/abs/2303.00462\n- Code: https://github.com/Toytiny/CMFlow\n\n**Marching-Primitives: Shape Abstraction from Signed Distance Function**\n\n- Paper: https://arxiv.org/abs/2303.13190\n- Code: https://github.com/ChirikjianLab/Marching-Primitives\n\n**Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision**\n\n- Paper: https://arxiv.org/abs/2303.00885\n- Code: None"
  },
  {
    "path": "CVPR2024-Papers-with-Code.md",
    "content": "# CVPR 2024 论文和开源项目合集(Papers with Code)\n\nCVPR 2024 decisions are now available on OpenReview！\n\n\n> 注1：欢迎各位大佬提交issue，分享CVPR 2024论文和开源项目！\n>\n> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n>\n> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)\n> - [CVPR 2023](CVPR2022-Papers-with-Code.md)\n\n欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！\n\n![](CVer学术交流群.png)\n\n# 【CVPR 2024 论文开源目录】\n\n- [3DGS(Gaussian Splatting)](#3DGS)\n- [Avatars](#Avatars)\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)\n- [MAE](#MAE)\n- [Embodied AI](#Embodied-AI)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [多模态大语言模型(MLLM)](#MLLM)\n- [大语言模型(LLM)](#LLM)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [DETR](#DETR)\n- [Prompt](#Prompt)\n- [扩散模型(Diffusion Models)](#Diffusion)\n- [ReID(重识别)](#ReID)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Vision Transformer](#Vision-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [目标检测(Object Detection)](#Object-Detection)\n- [异常检测(Anomaly Detection)](#Anomaly-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像(Medical Image)](#MI)\n- [医学图像分割(Medical Image Segmentation)](#MIS)\n- [视频目标分割(Video Object Segmentation)](#VOS)\n- [视频实例分割(Video Instance Segmentation)](#VIS)\n- [参考图像分割(Referring Image Segmentation)](#RIS)\n- [图像抠图(Image Matting)](#Matting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#SR)\n- [去噪(Denoising)](#Denoising)\n- [去模糊(Deblur)](#Deblur)\n- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3DOD)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D配准(3D Registration)](#3D-Registration)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)\n- [医学图像(Medical Image)](#Medical-Image)\n- [图像生成(Image Generation)](#Image-Generation)\n- [视频生成(Video Generation)](#Video-Generation)\n- [3D生成(3D Generation)](#3D-Generation)\n- [视频理解(Video Understanding)](#Video-Understanding)\n- [行为检测(Action Detection)](#Action-Detection)\n- [文本检测(Text Detection)](#Text-Detection)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [模型剪枝(Model Pruning)](#Pruning)\n- [图像压缩(Image Compression)](#IC)\n- [三维重建(3D Reconstruction)](#3D-Reconstruction)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [轨迹预测(Trajectory Prediction)](#TP)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [视觉问答(Visual Question Answering)](#VQA)\n- [手语识别(Sign Language Recognition)](#SLR)\n- [视频预测(Video Prediction)](#Video-Prediction)\n- [新视点合成(Novel View Synthesis)](#NVS)\n- [Zero-Shot Learning(零样本学习)](#ZSL)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#Feature-Matching)\n- [场景图生成(Scene Graph Generation)](#SGG)\n- [隐式神经表示(Implicit Neural Representations)](#INR)\n- [图像质量评价(Image Quality Assessment)](#IQA)\n- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n<a name=\"3DGS\"></a>\n\n# 3DGS(Gaussian Splatting)\n\n**Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering**\n\n- Homepage: https://city-super.github.io/scaffold-gs/\n- Paper: https://arxiv.org/abs/2312.00109\n- Code: https://github.com/city-super/Scaffold-GS\n\n**GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis**\n\n- Homepage: https://shunyuanzheng.github.io/GPS-Gaussian \n- Paper: https://arxiv.org/abs/2312.02155\n- Code: https://github.com/ShunyuanZheng/GPS-Gaussian\n\n**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**\n\n- Paper: https://arxiv.org/abs/2312.02134\n- Code: https://github.com/huliangxiao/GaussianAvatar\n\n**GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting**\n\n- Paper: https://arxiv.org/abs/2311.14521\n- Code: https://github.com/buaacyw/GaussianEditor \n\n**Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction**\n\n- Homepage: https://ingra14m.github.io/Deformable-Gaussians/ \n- Paper: https://arxiv.org/abs/2309.13101\n- Code: https://github.com/ingra14m/Deformable-3D-Gaussians\n\n**SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes**\n\n- Homepage: https://yihua7.github.io/SC-GS-web/ \n- Paper: https://arxiv.org/abs/2312.14937\n- Code: https://github.com/yihua7/SC-GS\n\n**Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis**\n\n- Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/ \n- Paper: https://arxiv.org/abs/2312.16812\n- Code: https://github.com/oppo-us-research/SpacetimeGaussians\n\n**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**\n\n- Homepage: https://fictionarry.github.io/DNGaussian/\n- Paper: https://arxiv.org/abs/2403.06912\n- Code: https://github.com/Fictionarry/DNGaussian\n\n**4D Gaussian Splatting for Real-Time Dynamic Scene Rendering**\n\n- Paper: https://arxiv.org/abs/2310.08528\n- Code: https://github.com/hustvl/4DGaussians\n\n**GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2310.08529\n- Code: https://github.com/hustvl/GaussianDreamer\n\n<a name=\"Avatars\"></a>\n\n# Avatars\n\n**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**\n\n- Paper: https://arxiv.org/abs/2312.02134\n- Code: https://github.com/huliangxiao/GaussianAvatar\n\n**Real-Time Simulated Avatar from Head-Mounted Sensors**\n\n- Homepage: https://www.zhengyiluo.com/SimXR/\n- Paper: https://arxiv.org/abs/2403.06862\n\n<a name=\"Backbone\"></a>\n\n# Backbone\n\n**RepViT: Revisiting Mobile CNN From ViT Perspective**\n\n- Paper: https://arxiv.org/abs/2307.09283\n- Code: https://github.com/THU-MIG/RepViT\n\n**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**\n\n- Paper: https://arxiv.org/abs/2311.17132\n- Code: https://github.com/DaiShiResearch/TransNeXt\n\n<a name=\"CLIP\"></a>\n\n# CLIP\n\n**Alpha-CLIP: A CLIP Model Focusing on Wherever You Want**\n\n- Paper: https://arxiv.org/abs/2312.03818\n- Code: https://github.com/SunzeY/AlphaCLIP\n\n**FairCLIP: Harnessing Fairness in Vision-Language Learning**\n\n- Paper: https://arxiv.org/abs/2403.19949\n- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP\n\n<a name=\"MAE\"></a>\n\n# MAE\n\n<a name=\"Embodied-AI\"></a>\n\n# Embodied AI\n\n**EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI**\n\n- Homepage: https://tai-wang.github.io/embodiedscan/\n- Paper: https://arxiv.org/abs/2312.16170\n- Code: https://github.com/OpenRobotLab/EmbodiedScan\n\n**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**\n\n- Homepage: https://iranqin.github.io/MP5.github.io/ \n- Paper: https://arxiv.org/abs/2312.07472\n- Code: https://github.com/IranQin/MP5\n\n**LEMON: Learning 3D Human-Object Interaction Relation from 2D Images**\n\n- Paper: https://arxiv.org/abs/2312.08963\n- Code: https://github.com/yyvhang/lemon_3d \n\n<a name=\"GAN\"></a>\n\n# GAN\n\n<a name=\"OCR\"></a>\n\n# OCR\n\n**An Empirical Study of Scaling Law for OCR**\n\n- Paper: https://arxiv.org/abs/2401.00028\n- Code: https://github.com/large-ocr-model/large-ocr-model.github.io\n\n**ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting**\n\n- Paper: https://arxiv.org/abs/2403.00303\n- Code: https://github.com/PriNing/ODM \n\n<a name=\"NeRF\"></a>\n\n# NeRF\n\n**PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF**\n\n- Paper: https://arxiv.org/abs/2311.13099\n- Code: https://github.com/FYTalon/pienerf/ \n\n<a name=\"DETR\"></a>\n\n# DETR\n\n**DETRs Beat YOLOs on Real-time Object Detection**\n\n- Paper: https://arxiv.org/abs/2304.08069\n- Code: https://github.com/lyuwenyu/RT-DETR\n\n**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**\n\n- Paper: https://arxiv.org/abs/2403.16131\n- Code: https://github.com/xiuqhou/Salience-DETR\n\n<a name=\"Prompt\"></a>\n\n# Prompt\n\n<a name=\"MLLM\"></a>\n\n# 多模态大语言模型(MLLM)\n\n**mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration**\n\n- Paper: https://arxiv.org/abs/2311.04257\n- Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2\n\n**Link-Context Learning for Multimodal LLMs**\n\n- Paper: https://arxiv.org/abs/2308.07891\n- Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main \n\n**OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation**\n\n- Paper: https://arxiv.org/abs/2311.17911\n- Code: https://github.com/shikiw/OPERA\n\n**Making Large Multimodal Models Understand Arbitrary Visual Prompts**\n\n- Homepage: https://vip-llava.github.io/ \n- Paper: https://arxiv.org/abs/2312.00784\n\n**Pink: Unveiling the power of referential comprehension for multi-modal llms**\n\n- Paper: https://arxiv.org/abs/2310.00582\n- Code: https://github.com/SY-Xuan/Pink\n\n**Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding**\n\n- Paper: https://arxiv.org/abs/2311.08046\n- Code: https://github.com/PKU-YuanGroup/Chat-UniVi\n\n**OneLLM: One Framework to Align All Modalities with Language**\n\n- Paper: https://arxiv.org/abs/2312.03700\n- Code: https://github.com/csuhan/OneLLM\n\n<a name=\"LLM\"></a>\n\n# 大语言模型(LLM)\n\n**VTimeLLM: Empower LLM to Grasp Video Moments**\n\n- Paper: https://arxiv.org/abs/2311.18445\n- Code: https://github.com/huangb23/VTimeLLM \n\n<a name=\"NAS\"></a>\n\n# NAS\n\n<a name=\"ReID\"></a>\n\n# ReID(重识别)\n\n**Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification**\n\n- Paper: https://arxiv.org/abs/2403.10254\n- Code: https://github.com/924973292/EDITOR \n\n**Noisy-Correspondence Learning for Text-to-Image Person Re-identification**\n\n- Paper: https://arxiv.org/abs/2308.09911\n\n- Code : https://github.com/QinYang79/RDE \n\n<a name=\"Diffusion\"></a>\n\n# 扩散模型(Diffusion Models)\n\n**InstanceDiffusion: Instance-level Control for Image Generation**\n\n- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/\n\n- Paper: https://arxiv.org/abs/2402.03290\n- Code: https://github.com/frank-xwang/InstanceDiffusion\n\n**Residual Denoising Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2308.13712\n- Code: https://github.com/nachifur/RDDM\n\n**DeepCache: Accelerating Diffusion Models for Free**\n\n- Paper: https://arxiv.org/abs/2312.00858\n- Code: https://github.com/horseee/DeepCache\n\n**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**\n\n- Homepage: https://tianhao-qi.github.io/DEADiff/ \n\n- Paper: https://arxiv.org/abs/2403.06951\n- Code: https://github.com/Tianhao-Qi/DEADiff_code\n\n**SVGDreamer: Text Guided SVG Generation with Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.16476\n- Code: https://ximinng.github.io/SVGDreamer-project/\n\n**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.05849\n- Code: https://github.com/jiuntian/interactdiffusion\n\n**MMA-Diffusion: MultiModal Attack on Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2311.17516\n- Code: https://github.com/yangyijune/MMA-Diffusion\n\n**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**\n\n- Homeoage: https://video-motion-customization.github.io/ \n- Paper: https://arxiv.org/abs/2312.00845\n- Code: https://github.com/HyeonHo99/Video-Motion-Customization\n\n<a name=\"Vision-Transformer\"></a>\n\n# Vision Transformer\n\n**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**\n\n- Paper: https://arxiv.org/abs/2311.17132\n- Code: https://github.com/DaiShiResearch/TransNeXt\n\n**RepViT: Revisiting Mobile CNN From ViT Perspective**\n\n- Paper: https://arxiv.org/abs/2307.09283\n- Code: https://github.com/THU-MIG/RepViT\n\n**A General and Efficient Training for Transformer via Token Expansion**\n\n- Paper: https://arxiv.org/abs/2404.00672\n- Code: https://github.com/Osilly/TokenExpansion \n\n<a name=\"VL\"></a>\n\n# 视觉和语言(Vision-Language)\n\n**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2403.02781\n- Code: https://github.com/zhengli97/PromptKD\n\n**FairCLIP: Harnessing Fairness in Vision-Language Learning**\n\n- Paper: https://arxiv.org/abs/2403.19949\n- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测(Object Detection)\n\n**DETRs Beat YOLOs on Real-time Object Detection**\n\n- Paper: https://arxiv.org/abs/2304.08069\n- Code: https://github.com/lyuwenyu/RT-DETR\n\n**Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation**\n\n- Paper: https://arxiv.org/abs/2312.01220\n- Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation \n\n**YOLO-World: Real-Time Open-Vocabulary Object Detection**\n\n- Paper: https://arxiv.org/abs/2401.17270\n- Code: https://github.com/AILab-CVC/YOLO-World\n\n**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**\n\n- Paper: https://arxiv.org/abs/2403.16131\n- Code: https://github.com/xiuqhou/Salience-DETR\n\n<a name=\"Anomaly-Detection\"></a>\n\n# 异常检测(Anomaly Detection)\n\n**Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection**\n\n- Paper: https://arxiv.org/abs/2310.12790\n- Code: https://github.com/mala-lab/AHL\n\n<a name=\"VT\"></a>\n\n# 目标跟踪(Object Tracking)\n\n**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**\n\n- Paper: https://arxiv.org/abs/2403.04700\n- Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT \n\n<a name=\"Semantic-Segmentation\"></a>\n\n# 语义分割(Semantic Segmentation)\n\n**Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2312.04265\n- Code: https://github.com/w1oves/Rein\n\n**SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2311.15537\n- Code: https://github.com/xb534/SED \n\n<a name=\"MI\"></a>\n\n# 医学图像(Medical Image)\n\n**Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology**\n\n- Paper: https://arxiv.org/abs/2402.17228\n- Code: https://github.com/DearCaat/RRT-MIL\n\n**VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis**\n\n- Paper: https://arxiv.org/abs/2402.17300\n- Code: https://github.com/Luffy03/VoCo\n\n**ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images**\n\n- Paper: https://arxiv.org/abs/2311.15264\n- Code: https://github.com/nicoboou/chada_vit \n\n<a name=\"MIS\"></a>\n\n# 医学图像分割(Medical Image Segmentation)\n\n\n\n<a name=\"Autonomous-Driving\"></a>\n\n# 自动驾驶(Autonomous Driving)\n\n**UniPAD: A Universal Pre-training Paradigm for Autonomous Driving**\n\n- Paper: https://arxiv.org/abs/2310.08370\n- Code: https://github.com/Nightmare-n/UniPAD\n\n**Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications**\n\n- Paper: https://arxiv.org/abs/2311.17663\n- Code: https://github.com/haomo-ai/Cam4DOcc\n\n**Memory-based Adapters for Online 3D Scene Perception**\n\n- Paper: https://arxiv.org/abs/2403.06974\n- Code: https://github.com/xuxw98/Online3D\n\n**Symphonize 3D Semantic Scene Completion with Contextual Instance Queries**\n\n- Paper: https://arxiv.org/abs/2306.15670\n- Code: https://github.com/hustvl/Symphonies\n\n**A Real-world Large-scale Dataset for Roadside Cooperative Perception**\n\n- Paper: https://arxiv.org/abs/2403.10145\n- Code: https://github.com/AIR-THU/DAIR-RCooper\n\n**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**\n\n- Paper: https://arxiv.org/abs/2403.07535\n- Code: https://github.com/Junda24/AFNet\n\n**Traffic Scene Parsing through the TSP6K Dataset**\n\n- Paper: https://arxiv.org/pdf/2303.02835.pdf\n- Code: https://github.com/PengtaoJiang/TSP6K \n\n<a name=\"3D-Point-Cloud\"></a>\n\n# 3D点云(3D-Point-Cloud)\n\n\n\n<a name=\"3DOD\"></a>\n\n# 3D目标检测(3D Object Detection)\n\n**PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection**\n\n- Paper: https://arxiv.org/abs/2312.08371\n- Code: https://github.com/kuanchihhuang/PTT\n\n**UniMODE: Unified Monocular 3D Object Detection**\n\n- Paper: https://arxiv.org/abs/2402.18573\n\n<a name=\"3DOD\"></a>\n\n# 3D语义分割(3D Semantic Segmentation)\n\n<a name=\"Image-Editing\"></a>\n\n# 图像编辑(Image Editing)\n\n**Edit One for All: Interactive Batch Image Editing**\n\n- Homepage: https://thaoshibe.github.io/edit-one-for-all \n- Paper: https://arxiv.org/abs/2401.10219\n- Code: https://github.com/thaoshibe/edit-one-for-all\n\n<a name=\"Video-Editing\"></a>\n\n# 视频编辑(Video Editing)\n\n**MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers**\n\n- Homepage:  [https://maskint.github.io](https://maskint.github.io/) \n\n- Paper: https://arxiv.org/abs/2312.12468\n\n<a name=\"LLV\"></a>\n\n# Low-level Vision\n\n**Residual Denoising Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2308.13712\n- Code: https://github.com/nachifur/RDDM\n\n**Boosting Image Restoration via Priors from Pre-trained Models**\n\n- Paper: https://arxiv.org/abs/2403.06793\n\n<a name=\"SR\"></a>\n\n# 超分辨率(Super-Resolution)\n\n**SeD: Semantic-Aware Discriminator for Image Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2402.19387\n- Code: https://github.com/lbc12345/SeD\n\n**APISR: Anime Production Inspired Real-World Anime Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2403.01598\n- Code: https://github.com/Kiteretsu77/APISR \n\n<a name=\"Denoising\"></a>\n\n# 去噪(Denoising)\n\n## 图像去噪(Image Denoising)\n\n<a name=\"3D-Human-Pose-Estimation\"></a>\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2311.12028\n- Code: https://github.com/NationalGAILab/HoT \n\n<a name=\"Image-Generation\"></a>\n\n# 图像生成(Image Generation)\n\n**InstanceDiffusion: Instance-level Control for Image Generation**\n\n- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/\n\n- Paper: https://arxiv.org/abs/2402.03290\n- Code: https://github.com/frank-xwang/InstanceDiffusion\n\n**ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations**\n\n- Homepage: https://eclipse-t2i.vercel.app/\n- Paper: https://arxiv.org/abs/2312.04655\n\n- Code: https://github.com/eclipse-t2i/eclipse-inference\n\n**Instruct-Imagen: Image Generation with Multi-modal Instruction**\n\n- Paper: https://arxiv.org/abs/2401.01952\n\n**Residual Denoising Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2308.13712\n- Code: https://github.com/nachifur/RDDM\n\n**UniGS: Unified Representation for Image Generation and Segmentation**\n\n- Paper: https://arxiv.org/abs/2312.01985\n\n**Multi-Instance Generation Controller for Text-to-Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2402.05408\n- Code: https://github.com/limuloo/migc\n\n**SVGDreamer: Text Guided SVG Generation with Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.16476\n- Code: https://ximinng.github.io/SVGDreamer-project/\n\n**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.05849\n- Code: https://github.com/jiuntian/interactdiffusion\n\n**Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following**\n\n- Paper: https://arxiv.org/abs/2311.17002\n- Code: https://github.com/ali-vilab/Ranni\n\n<a name=\"Video-Generation\"></a>\n\n# 视频生成(Video Generation)\n\n**Vlogger: Make Your Dream A Vlog**\n\n- Paper: https://arxiv.org/abs/2401.09414\n- Code: https://github.com/Vchitect/Vlogger\n\n**VBench: Comprehensive Benchmark Suite for Video Generative Models**\n\n- Homepage: https://vchitect.github.io/VBench-project/ \n- Paper: https://arxiv.org/abs/2311.17982\n- Code: https://github.com/Vchitect/VBench\n\n**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**\n\n- Homeoage: https://video-motion-customization.github.io/ \n- Paper: https://arxiv.org/abs/2312.00845\n- Code: https://github.com/HyeonHo99/Video-Motion-Customization\n\n<a name=\"3D-Generation\"></a>\n\n# 3D生成\n\n**CityDreamer: Compositional Generative Model of Unbounded 3D Cities**\n\n- Homepage: https://haozhexie.com/project/city-dreamer/ \n- Paper: https://arxiv.org/abs/2309.00610\n- Code: https://github.com/hzxie/city-dreamer\n\n**LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching**\n\n- Paper: https://arxiv.org/abs/2311.11284\n- Code: https://github.com/EnVision-Research/LucidDreamer \n\n<a name=\"Video-Understanding\"></a>\n\n# 视频理解(Video Understanding)\n\n**MVBench: A Comprehensive Multi-modal Video Understanding Benchmark**\n\n- Paper: https://arxiv.org/abs/2311.17005\n- Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 \n\n<a name=\"KD\"></a>\n\n# 知识蒸馏(Knowledge Distillation)\n\n**Logit Standardization in Knowledge Distillation**\n\n- Paper: https://arxiv.org/abs/2403.01427\n- Code: https://github.com/sunshangquan/logit-standardization-KD\n\n**Efficient Dataset Distillation via Minimax Diffusion**\n\n- Paper: https://arxiv.org/abs/2311.15529\n- Code: https://github.com/vimar-gu/MinimaxDiffusion\n\n<a name=\"Stereo-Matching\"></a>\n\n# 立体匹配(Stereo Matching)\n\n**Neural Markov Random Field for Stereo Matching**\n\n- Paper: https://arxiv.org/abs/2403.11193\n- Code: https://github.com/aeolusguan/NMRF \n\n<a name=\"SGG\"></a>\n\n# 场景图生成(Scene Graph Generation)\n\n**HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation**\n\n- Homepage: https://zhangce01.github.io/HiKER-SGG/ \n- Paper : https://arxiv.org/abs/2403.12033\n- Code: https://github.com/zhangce01/HiKER-SGG\n\n<a name=\"Video-Quality-Assessment\"></a>\n\n# 视频质量评价(Video Quality Assessment)\n\n**KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos**\n\n- Homepage: https://lixinustc.github.io/projects/KVQ/ \n\n- Paper: https://arxiv.org/abs/2402.07220\n- Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024\n\n<a name=\"Datasets\"></a>\n\n# 数据集(Datasets)\n\n**A Real-world Large-scale Dataset for Roadside Cooperative Perception**\n\n- Paper: https://arxiv.org/abs/2403.10145\n- Code: https://github.com/AIR-THU/DAIR-RCooper\n\n**Traffic Scene Parsing through the TSP6K Dataset**\n\n- Paper: https://arxiv.org/pdf/2303.02835.pdf\n- Code: https://github.com/PengtaoJiang/TSP6K \n\n<a name=\"Others\"></a>\n\n# 其他(Others)\n\n**Object Recognition as Next Token Prediction**\n\n- Paper: https://arxiv.org/abs/2312.02142\n- Code: https://github.com/kaiyuyue/nxtp\n\n**ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks**\n\n- Paper: https://arxiv.org/abs/2306.14525\n- Code: https://parameternet.github.io/ \n\n**Seamless Human Motion Composition with Blended Positional Encodings**\n\n- Paper: https://arxiv.org/abs/2402.15509\n- Code: https://github.com/BarqueroGerman/FlowMDM \n\n**LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning**\n\n- Homepage:  https://ll3da.github.io/ \n\n- Paper: https://arxiv.org/abs/2311.18651\n- Code: https://github.com/Open3DA/LL3DA\n\n **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update**\n\n- Homepage: https://clova-tool.github.io/ \n- Paper: https://arxiv.org/abs/2312.10908\n\n**MoMask: Generative Masked Modeling of 3D Human Motions**\n\n- Paper: https://arxiv.org/abs/2312.00063\n- Code: https://github.com/EricGuo5513/momask-codes\n\n **Amodal Ground Truth and Completion in the Wild**\n\n- Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/ \n- Paper: https://arxiv.org/abs/2312.17247\n- Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild\n\n**Improved Visual Grounding through Self-Consistent Explanations**\n\n- Paper: https://arxiv.org/abs/2312.04554\n- Code: https://github.com/uvavision/SelfEQ\n\n**ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object**\n\n- Homepage: https://chenshuang-zhang.github.io/imagenet_d/\n- Paper: https://arxiv.org/abs/2403.18775\n- Code: https://github.com/chenshuang-zhang/imagenet_d\n\n**Learning from Synthetic Human Group Activities**\n\n- Homepage: https://cjerry1243.github.io/M3Act/ \n- Paper  https://arxiv.org/abs/2306.16772\n- Code: https://github.com/cjerry1243/M3Act\n\n**A Cross-Subject Brain Decoding Framework**\n\n- Homepage: https://littlepure2333.github.io/MindBridge/\n- Paper: https://arxiv.org/abs/2404.07850\n- Code: https://github.com/littlepure2333/MindBridge\n\n**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**\n\n- Paper : https://arxiv.org/abs/2403.17749\n- Code: https://github.com/YuqiYang213/MLoRE\n\n**Contrastive Mean-Shift Learning for Generalized Category Discovery**\n\n- Homepage: https://postech-cvlab.github.io/cms/ \n- Paper: https://arxiv.org/abs/2404.09451\n- Code: https://github.com/sua-choi/CMS\n  "
  },
  {
    "path": "CVPR2025-Papers-with-Code.md",
    "content": "# CVPR 2025 论文和开源项目合集(Papers with Code)\n\nCVPR 2025 decisions are now available on OpenReview！22.1% = 2878 / 13008\n\n\n> 注1：欢迎各位大佬提交issue，分享CVPR 2025论文和开源项目！\n>\n> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n>\n> - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code)\n> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)\n> - [CVPR 2024](CVPR2024-Papers-with-Code.md)\n\n欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2025等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！\n\n![](CVer学术交流群.png)\n\n# 【CVPR 2025 论文开源目录】\n\n- [3DGS(Gaussian Splatting)](#3DGS)\n- [Agent)](#Agent)\n- [Avatars](#Avatars)\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)EVOS\n- [Mamba](#Mamba)\n- [Embodied AI](#Embodied-AI)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [多模态大语言模型(MLLM)](#MLLM)\n- [大语言模型(LLM)](#LLM)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [DETR](#DETR)\n- [扩散模型(Diffusion Models)](#Diffusion)\n- [ReID(重识别)](#ReID)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Vision Transformer](#Vision-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [目标检测(Object Detection)](#Object-Detection)\n- [异常检测(Anomaly Detection)](#Anomaly-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像(Medical Image)](#MI)\n- [医学图像分割(Medical Image Segmentation)](#MIS)\n- [视频目标分割(Video Object Segmentation)](#VOS)\n- [视频实例分割(Video Instance Segmentation)](#VIS)\n- [参考图像分割(Referring Image Segmentation)](#RIS)\n- [图像抠图(Image Matting)](#Matting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#SR)\n- [去噪(Denoising)](#Denoising)\n- [去模糊(Deblur)](#Deblur)\n- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3DOD)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D配准(3D Registration)](#3D-Registration)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)\n- [3D Visual Grounding(3D视觉定位)](#3DVG)\n- [医学图像(Medical Image)](#Medical-Image)\n- [图像生成(Image Generation)](#Image-Generation)\n- [视频生成(Video Generation)](#Video-Generation)\n- [3D生成(3D Generation)](#3D-Generation)\n- [视频理解(Video Understanding)](#Video-Understanding)\n- [行为检测(Action Detection)](#Action-Detection)\n- [具身智能(Embodied AI)](#Embodied)\n- [文本检测(Text Detection)](#Text-Detection)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [模型剪枝(Model Pruning)](#Pruning)\n- [图像压缩(Image Compression)](#IC)\n- [三维重建(3D Reconstruction)](#3D-Reconstruction)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [轨迹预测(Trajectory Prediction)](#TP)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [视觉问答(Visual Question Answering)](#VQA)\n- [手语识别(Sign Language Recognition)](#SLR)\n- [视频预测(Video Prediction)](#Video-Prediction)\n- [新视点合成(Novel View Synthesis)](#NVS)\n- [Zero-Shot Learning(零样本学习)](#ZSL)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#Feature-Matching)\n- [暗光图像增强(Low-light Image Enhancement)](#Low-light)\n- [场景图生成(Scene Graph Generation)](#SGG)\n- [风格迁移(Style Transfer)](#ST)\n- [隐式神经表示(Implicit Neural Representations)](#INR)\n- [图像质量评价(Image Quality Assessment)](#IQA)\n- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)\n- [压缩感知(Compressive Sensing)](#CS)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n<a name=\"3DGS\"></a>\n\n# 3DGS(Gaussian Splatting)\n\n\n<a name=\"Agent\"></a>\n\n# Agent\n\n**SpiritSight Agent: Advanced GUI Agent with One Look**\n\n- Paper: https://arxiv.org/abs/2503.03196\n- Code: https://hzhiyuan.github.io/SpiritSight-Agent\n\n\n<a name=\"Avatars\"></a>\n\n# Avatars\n\n\n# Backbone\n\n**Building Vision Models upon Heat Conduction**\n\n- Paper: https://arxiv.org/abs/2405.16555\n- Code: https://github.com/MzeroMiko/vHeat\n\n**LSNet: See Large, Focus Small**\n\n- Paper: https://arxiv.org/abs/2503.23135\n- Code: https://github.com/jameslahm/lsnet\n\n\n<a name=\"CLIP\"></a>\n\n# CLIP\n\n\n\n<a name=\"Mamba\"></a>\n\n# Mamba\n\n\n**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**\n\n- Paper: https://arxiv.org/abs/2407.08083\n- Code: https://github.com/NVlabs/MambaVision\n\n**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**\n\n- Paper: https://arxiv.org/abs/2411.15941\n- Code: https://github.com/lewandofskee/MobileMamba\n\n**MambaIC: State Space Models for High-Performance Learned Image Compression**\n\n- Paper: https://arxiv.org/abs/2503.12461\n- Code: https://arxiv.org/abs/2503.12461\n\n<a name=\"Embodied-AI\"></a>\n\n# Embodied AI\n\n**CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos**\n\n- Project: https://ai4ce.github.io/CityWalker/\n- Paper: https://arxiv.org/abs/2411.17820\n- Code: https://github.com/ai4ce/CityWalker\n\n\n<a name=\"GAN\"></a>\n\n# GAN\n\n<a name=\"OCR\"></a>\n\n# OCR\n\n\n<a name=\"NeRF\"></a>\n\n# NeRF\n\n\n\n<a name=\"DETR\"></a>\n\n# DETR\n\n**Mr. DETR: Instructive Multi-Route Training for Detection Transformers**\n\n- Paper: https://arxiv.org/abs/2412.10028\n- Code: https://github.com/Visual-AI/Mr.DETR\n\n\n<a name=\"Prompt\"></a>\n\n# Prompt\n\n<a name=\"MLLM\"></a>\n\n# 多模态大语言模型(MLLM)\n\n**LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences**\n\n- Paper： https://arxiv.org/abs/2412.01292\n- Code: https://github.com/Hoyyyaard/LSceneLLM\n\n\n**DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution**\n\n- Paper: https://arxiv.org/abs/2405.16071\n- Code: https://github.com/callsys/DynRefer\n\n\n**Retrieval-Augmented Personalization for Multimodal Large Language Models**\n\n- Project Page: https://hoar012.github.io/RAP-Project/\n- Paper: https://arxiv.org/abs/2410.13360\n- Code: https://github.com/Hoar012/RAP-MLLM\n\n**BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2411.15232\n- Code: https://github.com/HealthX-Lab/BiomedCoOp\n\n**FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression**\n\n- Paper: https://arxiv.org/abs/2412.04317\n- Code: https://github.com/codefanw/FlashSloth\n\n**MMRL: Multi-Modal Representation Learning for Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2503.08497\n- Code: https://github.com/yunncheng/MMRL\n\n**PAVE: Patching and Adapting Video Large Language Models**\n\n- Paper: https://arxiv.org/abs/2503.19794\n- Code: https://github.com/dragonlzm/PAVE\n\n**AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization**\n\n- Paper: https://arxiv.org/abs/2503.23733\n- Code: https://github.com/THUNLP-MT/AdaMMS\n\n\n<a name=\"LLM\"></a>\n\n# 大语言模型(LLM)\n\n\n\n\n<a name=\"NAS\"></a>\n\n# NAS\n\n<a name=\"ReID\"></a>\n\n# ReID(重识别)\n\n**From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization**\n\n- Paper: https://arxiv.org/abs/2503.00938\n- Code: https://github.com/yuanc3/Pose2ID\n\n\n**AirRoom: Objects Matter in Room Reidentification**\n\n- Project: https://sairlab.org/airroom/\n- Paper: https://arxiv.org/abs/2503.01130\n\n\n**IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification**\n\n- Paper: https://arxiv.org/abs/2503.10324\n- Code: https://github.com/924973292/IDEA\n\n\n\n<a name=\"Diffusion\"></a>\n\n# 扩散模型(Diffusion Models)\n\n**TinyFusion: Diffusion Transformers Learned Shallow**\n\n- Paper: https://arxiv.org/abs/2412.01199\n- Code: https://github.com/VainF/TinyFusion\n\n**DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture**\n\n- Paper: https://arxiv.org/abs/2409.03550\n- Code: https://github.com/qianlong0502/DKDM\n\n**Tiled Diffusion**\n\n- Homepage: https://madaror.github.io/tiled-diffusion.github.io/\n- Paper: https://arxiv.org/abs/2412.15185\n- Code: https://github.com/madaror/tiled-diffusion\n\n\n<a name=\"Vision-Transformer\"></a>\n\n# Vision Transformer\n\n\n\n<a name=\"VL\"></a>\n\n# 视觉和语言(Vision-Language)\n\n**NLPrompt: Noise-Label Prompt Learning for Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2412.01256\n- Code: https://github.com/qunovo/NLPrompt\n\n**PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability**\n\n- Paper: https://arxiv.org/abs/2503.08481\n- Code: https://github.com/unira-zwj/PhysVLM\n\n**MMRL: Multi-Modal Representation Learning for Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2503.08497\n- Code: https://github.com/yunncheng/MMRL\n\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测(Object Detection)\n\n\n**LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models**\n\n- Paper: https://arxiv.org/abs/2501.18954\n- Code：https://github.com/iSEE-Laboratory/LLMDet\n\n**Mr. DETR: Instructive Multi-Route Training for Detection Transformers**\n\n- Paper: https://arxiv.org/abs/2412.10028\n- Code: https://github.com/Visual-AI/Mr.DETR\n\n\n<a name=\"Anomaly-Detection\"></a>\n\n# 异常检测(Anomaly Detection)\n\n\n\n<a name=\"VT\"></a>\n\n# 目标跟踪(Object Tracking)\n\n**Multiple Object Tracking as ID Prediction**\n\n- Paper：https://arxiv.org/abs/2403.16848\n- Code: https://github.com/MCG-NJU/MOTIP\n\n**Omnidirectional Multi-Object Tracking**\n\n- Paper:https://arxiv.org/abs/2503.04565\n- Code:https://github.com/xifen523/OmniTrack\n\n\n<a name=\"MI\"></a>\n\n# 医学图像(Medical Image)\n\n\n**BrainMVP: Multi-modal Vision Pre-training for Medical Image Analysis**\n\n- Paper: https://arxiv.org/abs/2410.10604\n- Code: https://github.com/shaohao011/BrainMVP\n\n\n# 医学图像分割(Medical Image Segmentation)\n\n**Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation**\n\n- Paper: https://arxiv.org/abs/2503.13012\n- Code: https://github.com/Yore0/TTDG-MGM\n\n\n<a name=\"Autonomous-Driving\"></a>\n\n# 自动驾驶(Autonomous Driving)\n\n**LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes**\n\n- Project: https://ldkong.com/LiMoE\n- Paper: https://arxiv.org/abs/2501.04004\n- Code: https://github.com/Xiangxu-0103/LiMoE\n\n\n\n# 3D点云(3D-Point-Cloud)\n\n**Unlocking Generalization Power in LiDAR Point Cloud Registration**\n\n- Paper: https://arxiv.org/abs/2503.10149\n- Code: https://github.com/peakpang/UGP\n\n\n<a name=\"3DOD\"></a>\n\n# 3D目标检测(3D Object Detection)\n\n\n\n<a name=\"3DOD\"></a>\n\n# 3D语义分割(3D Semantic Segmentation)\n\n\n\n\n\n<a name=\"LLV\"></a>\n\n# Low-level Vision\n\n\n\n<a name=\"SR\"></a>\n\n# 超分辨率(Super-Resolution)\n\n**AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2412.00124\n- Code: https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution\n\n\n<a name=\"Denoising\"></a>\n\n# 去噪(Denoising)\n\n## 图像去噪(Image Denoising)\n\n<a name=\"3D-Human-Pose-Estimation\"></a>\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n**Reconstructing Humans with a Biomechanically Accurate Skeleton**\n\n- Homepage: https://isshikihugh.github.io/HSMR/\n- Code: https://github.com/IsshikiHugh/HSMR\n\n<a name=\"3DVG\"></a>\n\n#3D Visual Grounding(3D视觉定位)\n\n**ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding**\n\n- Homepage: https://pqh22.github.io/projects/ProxyTransformation/index.html\n\n- Code: https://github.com/pqh22/ProxyTransformation\n\n- Paper: https://arxiv.org/abs/2502.19247\n\n\n<a name=\"Image-Generation\"></a>\n\n# 图像生成(Image Generation)\n\n**Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2501.01423\n- Code: https://github.com/hustvl/LightningDiT\n\n**SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2412.04852\n- Code: https://github.com/taco-group/SleeperMark\n\n\n**TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation**\n\n- Homepage: https://byteflow-ai.github.io/TokenFlow/\n- Code: https://github.com/ByteFlow-AI/TokenFlow\n- Paper:https://arxiv.org/abs/2412.03069\n\n**PAR: Parallelized Autoregressive Visual Generation**\n\n- Project: https://epiphqny.github.io/PAR-project/\n- Paper: https://arxiv.org/abs/2412.15119\n- Code: https://github.com/Epiphqny/PAR\n\n\n**Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis**\n\n- Project: https://generative-photography.github.io/project/\n- Paper: https://arxiv.org/abs/2412.02168\n- Code: https://github.com/pandayuanyu/generative-photography\n\n\n**OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation**\n\n- Project Page: https://opening-benchmark.github.io/\n- Paper: https://arxiv.org/abs/2411.18499).\n- Code: https://github.com/LanceZPF/OpenING\n\n\n\n\n<a name=\"Video-Generation\"></a>\n\n# 视频生成(Video Generation)\n\n**Identity-Preserving Text-to-Video Generation by Frequency Decomposition**\n\n- Paper: https://arxiv.org/abs/2411.17440\n- Code: https://github.com/PKU-YuanGroup/ConsisID\n\n\n**Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2407.15642\n- Code: https://github.com/maxin-cn/Cinemo\n\n**X-Dyna: Expressive Dynamic Human Image Animation**\n\n- Paper: https://arxiv.org/abs/2501.10021\n- Code: https://github.com/bytedance/X-Dyna\n\n**PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation**\n\n- Paper: https://arxiv.org/pdf/2412.00596\n- Code: https://github.com/pittisl/PhyT2V\n\n\n**Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model**\n\n- Project: https://liewfeng.github.io/TeaCache/\n- Paper: https://arxiv.org/abs/2411.19108\n- Code: https://github.com/ali-vilab/TeaCache\n\n\n**AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion**\n\n- Project: https://iva-mzsun.github.io/AR-Diffusion\n- Paper: https://arxiv.org/abs/2503.07418\n- Code: https://github.com/iva-mzsun/AR-Diffusion\n\n\n<a name=\"Image-Editing\"></a>\n\n# 图像编辑(Image Editing)\n\n**Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing**\n\n- Paper: https://arxiv.org/abs/2411.16832\n- Code: https://github.com/taco-group/FaceLock\n\n\n**h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform**\n\n- Paper: https://arxiv.org/abs/2503.02187\n- Code: https://github.com/nktoan/h-edit\n\n\n<a name=\"Video-Editing\"></a>\n\n# 视频编辑(Video Editing)\n\n\n\n<a name=\"3D-Generation\"></a>\n\n# 3D生成(3D Generation)\n\n\n**Generative Gaussian Splatting for Unbounded 3D City Generation**\n\n- Project: https://haozhexie.com/project/gaussian-city\n- Paper: https://arxiv.org/abs/2406.06526\n- Code: https://github.com/hzxie/GaussianCity\n\n**StdGEN: Semantic-Decomposed 3D Character Generation from Single Images**\n\n- Project: https://stdgen.github.io/\n- Paper: https://arxiv.org/abs/2411.05738\n- Code: https://github.com/hyz317/StdGEN\n\n\n<a name=\"3D-Reconstruction\"></a>\n\n# 3D重建(3D Reconstruction)\n\n**Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass**\n\n- Project: https://fast3r-3d.github.io/\n- Paper: https://arxiv.org/abs/2501.13928\n\n\n<a name=\"HMG\"></a>\n\n# 人体运动生成(Human Motion Generation)\n\n**SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance**\n\n- Project: https://4dvlab.github.io/project_page/semgeomo/\n- Paper: https://arxiv.org/abs/2503.01291\n- https://github.com/4DVLab/SemGeoMo\n\n<a name=\"Video-Understanding\"></a>\n\n# 视频理解(Video Understanding)\n\n**Temporal Grounding Videos like Flipping Manga**\n\n- Paper: https://arxiv.org/abs/2411.10332\n- Code: https://github.com/yongliang-wu/NumPro\n\n<a name=\"Embodied\"></a>\n\n# 具身智能(Embodied AI)\n\n**Universal Actions for Enhanced Embodied Foundation Models**\n\n- Project: https://2toinf.github.io/UniAct/\n- Paper: https://arxiv.org/abs/2501.10105\n- Code: https://github.com/2toinf/UniAct\n\n**PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability**\n\n- Paper: https://arxiv.org/abs/2503.08481\n- Code: https://github.com/unira-zwj/PhysVLM\n\n\n<a name=\"KD\"></a>\n\n# 知识蒸馏(Knowledge Distillation)\n\n<a name=\"Depth-Estimation\"></a>\n\n\n# 深度估计(Depth Estimation)\n\n**DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos**\n\n- Project: https://depthcrafter.github.io\n- Paper: https://arxiv.org/abs/2409.02095\n- Code: https://github.com/Tencent/DepthCrafter\n\n\n**MonSter: Marry Monodepth to Stereo Unleashes Power**\n\n- Paper: https://arxiv.org/abs/2501.08643\n- Code: https://github.com/Junda24/MonSter\n\n**DEFOM-Stereo: Depth Foundation Model Based Stereo Matching**\n\n- Project: https://insta360-research-team.github.io/DEFOM-Stereo/\n- Paper: https://arxiv.org/abs/2501.09466\n- Code: https://github.com/Insta360-Research-Team/DEFOM-Stereo\n\n\n<a name=\"Stereo-Matching\"></a>\n\n# 立体匹配(Stereo Matching)\n\n**MonSter: Marry Monodepth to Stereo Unleashes Power**\n\n- Paper: https://arxiv.org/abs/2501.08643\n- Code: https://github.com/Junda24/MonSter\n\n\n<a name=\"Low-light\"></a>\n\n# 暗光图像增强(Low-light Image Enhancement)\n\n\n**HVI: A New color space for Low-light Image Enhancement**\n\n- Paper: https://arxiv.org/abs/2502.20272\n- Code: https://github.com/Fediory/HVI-CIDNet\n- Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_\n\n**ReDDiT: Efficient Diffusion as Low Light Enhancer**\n\n- Paper: https://arxiv.org/abs/2410.12346\n- Code: https://github.com/lgz-0713/ReDDiT\n\n\n\n<a name=\"IC\"></a>\n\n# 图像压缩(Image Compression)](#IC)\n\n**MambaIC: State Space Models for High-Performance Learned Image Compression**\n\n- Paper: https://arxiv.org/abs/2503.12461\n- Code: https://arxiv.org/abs/2503.12461\n\n\n<a name=\"SGG\"></a>\n\n# 场景图生成(Scene Graph Generation)\n\n\n\n<a name=\"ST\"></a>\n\n# 风格迁移(Style Transfer)\n\n**StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements**\n\n- Project: https://stylestudio-official.github.io/\n- Paper: https://arxiv.org/abs/2412.08503\n- Code: https://github.com/Westlake-AGI-Lab/StyleStudio\n\n\n<a name=\"IQA\"></a>\n\n# 图像质量评价(Image Quality Assessment)\n\n**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**\n\n- Homepage: https://yichengchen24.github.io/projects/autocherrypicker\n- Paper: https://arxiv.org/pdf/2406.20085\n- Code: https://github.com/yichengchen24/ACP\n\n<a name=\"Video-Quality-Assessment\"></a>\n\n# 视频质量评价(Video Quality Assessment)\n\n<a name=\"CS\"></a>\n\n# 压缩感知(Compressive Sensing)\n\n**Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing**\n\n- Paper: https://arxiv.org/abs/2503.08429\n- Code: https://github.com/FengodChen/DMP-DUN-CVPR2025\n\n\n<a name=\"Datasets\"></a>\n\n# 数据集(Datasets)\n\n\n**Objaverse++: Curated 3D Object Dataset with Quality Annotations**\n\n- Paper: https://arxiv.org/abs/2504.07334\n- Code: https://github.com/TCXX/ObjaversePlusPlus\n\n\n<a name=\"Others\"></a>\n\n# 其他(Others)\n\n\n**DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry**\n\n- Paper: https://arxiv.org/abs/2503.13110\n- Code: https://github.com/jinli99/DTGBrepGen\n\n\n**Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2503.19307\n- Code: https://github.com/delaprada/HandSynthesis.git\n\n**EVOS: Efficient Implicit Neural Training via EVOlutionary Selector**\n\n- Homepage: https://weixiang-zhang.github.io/proj-evos/\n- Paper: https://arxiv.org/abs/2412.10153\n- Code: https://github.com/zwx-open/EVOS-INR\n  "
  },
  {
    "path": "README.md",
    "content": "# CVPR 2026 论文和开源项目合集(Papers with Code)\n\nCVPR 2026 decisions are now available on OpenReview！25.42% = 4090 / 16092\n\n\n> 注1：欢迎各位大佬提交issue，分享CVPR 2026论文和开源项目！\n>\n> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n>\n> - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code)\n> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)\n\n\n欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2026等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！\n\n![](CVer学术交流群.png)\n\n# 【CVPR 2026 论文开源目录】\n\n- [3DGS(Gaussian Splatting)](#3DGS)\n- [Agent)](#Agent)\n- [Avatars](#Avatars)\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)\n- [Mamba](#Mamba)\n- [Embodied AI](#Embodied-AI)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [多模态大语言模型(MLLM)](#MLLM)\n- [大语言模型(LLM)](#LLM)\n- [具身智能(Embodied AI)](#Embodied)\n- [空间智能(Spatial Intelligence](#SI)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [DETR](#DETR)\n- [扩散模型(Diffusion Models)](#Diffusion)\n- [ReID(重识别)](#ReID)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Vision Transformer](#Vision-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [目标检测(Object Detection)](#Object-Detection)\n- [异常检测(Anomaly Detection)](#Anomaly-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像(Medical Image)](#MI)\n- [医学图像分割(Medical Image Segmentation)](#MIS)\n- [视频目标分割(Video Object Segmentation)](#VOS)\n- [视频实例分割(Video Instance Segmentation)](#VIS)\n- [参考图像分割(Referring Image Segmentation)](#RIS)\n- [图像抠图(Image Matting)](#Matting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#SR)\n- [去噪(Denoising)](#Denoising)\n- [去模糊(Deblur)](#Deblur)\n- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3DOD)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D配准(3D Registration)](#3D-Registration)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)\n- [3D Visual Grounding(3D视觉定位)](#3DVG)\n- [医学图像(Medical Image)](#Medical-Image)\n- [图像生成(Image Generation)](#Image-Generation)\n- [视频生成(Video Generation)](#Video-Generation)\n- [3D生成(3D Generation)](#3D-Generation)\n- [视频理解(Video Understanding)](#Video-Understanding)\n- [行为检测(Action Detection)](#Action-Detection)\n- [遥感(Remote)](#Remote)\n- [文本检测(Text Detection)](#Text-Detection)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [模型剪枝(Model Pruning)](#Pruning)\n- [图像压缩(Image Compression)](#IC)\n- [视频压缩(Video Compression)](#VC)\n- [三维重建(3D Reconstruction)](#3D-Reconstruction)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [轨迹预测(Trajectory Prediction)](#TP)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [视觉问答(Visual Question Answering)](#VQA)\n- [手语识别(Sign Language Recognition)](#SLR)\n- [视频预测(Video Prediction)](#Video-Prediction)\n- [新视点合成(Novel View Synthesis)](#NVS)\n- [Zero-Shot Learning(零样本学习)](#ZSL)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#Feature-Matching)\n- [暗光图像增强(Low-light Image Enhancement)](#Low-light)\n- [场景图生成(Scene Graph Generation)](#SGG)\n- [图像检索(Image Retrieval)](#Image-Retrieval)\n- [风格迁移(Style Transfer)](#ST)\n- [隐式神经表示(Implicit Neural Representations)](#INR)\n- [图像质量评价(Image Quality Assessment)](#IQA)\n- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)\n- [压缩感知(Compressive Sensing)](#CS)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n<a name=\"3DGS\"></a>\n\n# 3DGS(Gaussian Splatting)\n\n**Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting**\n\n- Paper: https://arxiv.org/abs/2602.20933\n- Code: \n- Project: https://sk-fun.fun/DropAnSH-GS\n\n**Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking**\n\n- Paper: https://arxiv.org/abs/2512.01329\n- Project: https://haza628.github.io/tagSplat/\n\n**FastGS: Training 3D Gaussian Splatting in 100 Seconds**\n\n- Paper: https://arxiv.org/pdf/2511.04283\n- Code: https://github.com/fastgs/FastGS\n- Project: https://fastgs.github.io/\n\n\n<a name=\"Agent\"></a>\n\n# Agent\n\n\n\n\n<a name=\"Avatars\"></a>\n\n# Avatars\n\n\n# Backbone\n\n\n\n\n<a name=\"CLIP\"></a>\n\n# CLIP\n\n\n\n<a name=\"Mamba\"></a>\n\n# Mamba\n\n\n\n<a name=\"GAN\"></a>\n\n# GAN\n\n<a name=\"OCR\"></a>\n\n# OCR\n\n\n<a name=\"NeRF\"></a>\n\n# NeRF\n\n\n\n<a name=\"DETR\"></a>\n\n# DETR\n\n\n\n\n<a name=\"Prompt\"></a>\n\n# Prompt\n\n<a name=\"MLLM\"></a>\n\n# 多模态大语言模型(MLLM)\n\n**Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking**\n\n- Paper: https://arxiv.org/abs/2602.20330\n- Code: https://github.com/UIUC-MONET/vlm-circuit-tracing\n\n**UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark**\n\n- Paper: https://arxiv.org/abs/2603.05075\n- Code: \n- Project: https://any2any-mllm.github.io/unim/\n\n\n\n<a name=\"LLM\"></a>\n\n# 大语言模型(LLM)\n\n\n<a name=\"Embodied-AI\"></a>\n\n\n# 具身智能(Embodied AI)\n\n**Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI**\n\n- Paper: https://arxiv.org/abs/2511.20620\n- Code: https://github.com/ai4ce/wanderland\n- Project: https://ai4ce.github.io/wanderland/\n\n\n<a name=\"SI\"></a>\n\n\n# 空间智能(Spatial Intelligence)\n\n**Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning**\n\n- Paper: https://arxiv.org/abs/2510.27606\n- Code: https://github.com/InternLM/Spatial-SSRL\n- Model: https://huggingface.co/internlm/Spatial-SSRL-7B\n\n\n<a name=\"NAS\"></a>\n\n# NAS\n\n<a name=\"ReID\"></a>\n\n# ReID(重识别)\n\n\n**MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification**\n\n- Paper: https://arxiv.org/abs/2512.03404\n- Code: https://github.com/yjzhao1019/MOS\n\n\n<a name=\"Diffusion\"></a>\n\n# 扩散模型(Diffusion Models)\n\n\n\n<a name=\"Vision-Transformer\"></a>\n\n# Vision Transformer\n\n\n\n<a name=\"VL\"></a>\n\n# 视觉和语言(Vision-Language)\n\n**StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues**\n\n- Paper: https://arxiv.org/abs/2602.20089\n- Code: https://github.com/intelligolabs/StructXLIP\n\n**ApET: Approximation-Error Guided Token Compression for Efficient VLMs**\n\n- Paper: https://arxiv.org/abs/2602.19870\n- Code: https://github.com/MaQianKun0/ApET\n\n**Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking**\n\n- Paper: https://arxiv.org/abs/2602.20330\n- Code: https://github.com/UIUC-MONET/vlm-circuit-tracing\n\n\n<a name=\"Object-Detection\"></a>\n\n# 目标检测(Object Detection)\n\n\n\n\n<a name=\"Anomaly-Detection\"></a>\n\n# 异常检测(Anomaly Detection)\n\n\n\n<a name=\"VT\"></a>\n\n# 目标跟踪(Object Tracking)\n\n\n\n\n<a name=\"MI\"></a>\n\n# 医学图像(Medical Image)\n\n\n\n\n\n# 医学图像分割(Medical Image Segmentation)\n\n**MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation**\n\n- Paper: https://arxiv.org/abs/2602.20423\n- Code: https://github.com/HealthX-Lab/MedCLIPSeg\n- Project: https://tahakoleilat.github.io/MedCLIPSeg\n\n<a name=\"Autonomous-Driving\"></a>\n\n# 自动驾驶(Autonomous Driving)\n\n**Open-Vocabulary Domain Generalization in Urban-Scene Segmentation**\n\n- Paper: https://arxiv.org/pdf/2602.18853\n- Code: https://github.com/DZhaoXd/s2_corr\n\n**U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences**\n\n- Paper: https://arxiv.org/abs/2512.02982\n- Code: https://github.com/worldbench/U4D\n\n\n# 3D点云(3D-Point-Cloud)\n\n**CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation**\n\n- Paper: https://arxiv.org/abs/2602.20409\n- Code: https://github.com/SarthakM320/CLIPoint3D\n\n\n<a name=\"3DOD\"></a>\n\n# 3D目标检测(3D Object Detection)\n\n\n\n<a name=\"3DOD\"></a>\n\n# 3D语义分割(3D Semantic Segmentation)\n\n\n\n\n\n<a name=\"LLV\"></a>\n\n# Low-level Vision\n\n\n\n<a name=\"SR\"></a>\n\n# 超分辨率(Super-Resolution)\n\n\n\n\n<a name=\"Denoising\"></a>\n\n# 去噪(Denoising)\n\n## 图像去噪(Image Denoising)\n\n<a name=\"3D-Human-Pose-Estimation\"></a>\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n\n\n<a name=\"3DVG\"></a>\n\n#3D Visual Grounding(3D视觉定位)\n\n\n\n\n<a name=\"Image-Generation\"></a>\n\n# 图像生成(Image Generation)\n\n\nExpPortrait: Expressive Portrait Generation via Personalized Representation\n\n- Paper: https://arxiv.org/abs/2602.19900\n- Code: \n\n\n<a name=\"Video-Generation\"></a>\n\n# 视频生成(Video Generation)\n\n\n\n\n<a name=\"Image-Editing\"></a>\n\n# 图像编辑(Image Editing)\n\n\n\n<a name=\"Video-Editing\"></a>\n\n# 视频编辑(Video Editing)\n\n\n\n<a name=\"3D-Generation\"></a>\n\n# 3D生成(3D Generation)\n\n\n\n\n<a name=\"3D-Reconstruction\"></a>\n\n# 3D重建(3D Reconstruction)\n\n**tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction**\n\n- Project: https://cwchenwang.github.io/tttLRM/\n- Paper: https://arxiv.org/abs/2602.20160\n- Code: https://github.com/cwchenwang/tttLRM\n\n**Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning**\n\n- Project: https://flow3r-project.github.io/\n- Paper: https://arxiv.org/abs/2602.20157\n- Code: https://github.com/Kidrauh/flow3r\n\n**RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing**\n\n- Paper: https://arxiv.org/abs/2602.19753\n- Code: https://github.com/yyyykf/RAP\n\n\n<a name=\"HMG\"></a>\n\n# 人体运动生成(Human Motion Generation)\n\n<a name=\"Video-Understanding\"></a>\n\n# 视频理解(Video Understanding)\n\n\n\n\n\n<a name=\"Remote\"></a>\n\n# 遥感(Remote)\n\nBrewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation\n\n- Paper: https://arxiv.org/abs/2602.19863\n- Code: None\n\n\n<a name=\"KD\"></a>\n\n# 知识蒸馏(Knowledge Distillation)\n\n<a name=\"Depth-Estimation\"></a>\n\n\n# 深度估计(Depth Estimation)\n\n\n\n\n<a name=\"Stereo-Matching\"></a>\n\n# 立体匹配(Stereo Matching)\n\n\n<a name=\"Low-light\"></a>\n\n# 暗光图像增强(Low-light Image Enhancement)\n\n\n\n\n\n<a name=\"IC\"></a>\n\n# 图像压缩(Image Compression)](#IC)\n\n\n\n<a name=\"VC\"></a>\n\n# 视频压缩(Video Compression)](#VC)\n\n**UniComp: Rethinking Video Compression Through Informational Uniqueness**\n\n- Paper: https://arxiv.org/abs/2512.03575\n- Code: https://github.com/TimeMarker-LLM/UniComp\n\n\n\n<a name=\"SGG\"></a>\n\n# 场景图生成(Scene Graph Generation)\n\n\n<a name=\"Image-Retrieval\"></a>\n\n# 图像检索(Image Retrieval)\n\n**PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing\n**\n\n- Paper: https://arxiv.org/abs/2603.04598\n- Code: \n\n\n<a name=\"ST\"></a>\n\n# 风格迁移(Style Transfer)\n\n\n\n<a name=\"IQA\"></a>\n\n# 图像质量评价(Image Quality Assessment)\n\n\n\n<a name=\"Video-Quality-Assessment\"></a>\n\n# 视频质量评价(Video Quality Assessment)\n\n<a name=\"CS\"></a>\n\n# 压缩感知(Compressive Sensing)\n\n\n\n<a name=\"Datasets\"></a>\n\n# 数据集(Datasets)\n\n\n\n\n<a name=\"Others\"></a>\n\n# 其他(Others)\n\n**Decoupling Defense Strategies for Robust Image Watermarking**\n\n- Paper: https://arxiv.org/abs/2602.20053\n- Code: None\n\n**Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery**\n\n- Paper: https://arxiv.org/abs/2602.19910\n- Code: \n\n**The Invisible Gorilla Effect in Out-of-distribution Detection**\n\n- Paper: https://arxiv.org/abs/2602.20068\n- Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect\n\n**SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images**\n\n- Paper: https://arxiv.org/abs/2602.20412\n- Code: \n\n**RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces**\n\n- Paper: https://arxiv.org/abs/2602.20618\n- Code: \n\n**Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models**\n\n- Paper:\n- Code: \n\n**GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement**\n\n- Paper: https://arxiv.org/abs/2603.05095\n- Code: \n\n\n**FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation**\n\n- Paper: https://arxiv.org/abs/2603.04733\n- Code: https://github.com/eVI-group-SCU/FOZO\n\n**Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning\n**\n\n- Paper: https://arxiv.org/abs/2603.04825\n- Code: https://github.com/RyanZhaoIc/CAD\n\n  "
  },
  {
    "path": "master",
    "content": ""
  }
]