Full Code of extreme-assistant/CVPR2024-Paper-Code-Interpretation for AI

master 0857efb94cb2 cached

10 files

1.5 MB

459.2k tokens

1 requests

Download .txt

Showing preview only (1,544K chars total). Download the full file or copy to clipboard to get everything.

Repository: extreme-assistant/CVPR2024-Paper-Code-Interpretation
Branch: master
Commit: 0857efb94cb2
Files: 10
Total size: 1.5 MB

Directory structure:
gitextract_xntj19ao/

├── CVPR2020.md
├── CVPR2021.md
├── CVPR2022.md
├── CVPR2023.md
├── CVPR2024.md
├── README.md
├── cvpr2018-githublinks.csv
├── cvpr_2019_githublinks.csv
├── cvpr_2019_oral.csv
└── cvpr_2019_poster.csv

================================================
FILE CONTENTS
================================================

================================================
FILE: CVPR2020.md
================================================

* 推荐阅读：<br>
  * [2020-2021年计算机视觉综述论文汇总](https://github.com/extreme-assistant/survey-computer-vision)<br>
  * [2019-2020年目标跟踪资源全汇总（论文、模型代码、优秀实验室）](https://github.com/extreme-assistant/Object-Tracking-Paper-Benchmark-Team)


------

<br>

# CVPR2020最新信息及论文下载贴（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）


官网链接：http://cvpr2020.thecvf.com/<br>
时间：Seattle, Washington，2020年6月14日-6月19日<br>
论文接收公布时间：2020年2月24日<br>

相关问题：
* [如何评价2020年计算机视觉顶会CVPR投稿量破万的现象?](https://www.zhihu.com/question/356099725/)<br>
* [如何评价 CVPR 2020的论文接收结果？有哪些亮点论文？](https://www.zhihu.com/question/372070853)<br><br>

# 总目录
[1. CVPR2020接收论文分类汇总（持续更新）](#100)<br>
[2. CVPR2020 Oral（持续更新）](#101)<br>
[3. CVPR2020 论文解读](#102)<br>
[4. To do list](#103)<br>
[5. Related works](#104)<br>


<br><br>

<a name="100"/>

# 1.CVPR2020接收论文（持续更新）<br>

### 分类汇总<br>
*   [20.CVPR 2020 论文大盘点-动作检测与动作分割篇](https://bbs.cvmart.net/topics/3028)<br>

*   [19.CVPR 2020 论文大盘点-动作识别篇](https://bbs.cvmart.net/topics/3000)<br>
 
*   [18.CVPR 2020 论文大盘点-光流篇](https://bbs.cvmart.net/topics/2992)<br>
  
*   [17.CVPR 2020 论文大盘点-图像与视频检索篇](https://bbs.cvmart.net/topics/2964)<br>

*   [16.CVPR 2020 论文大盘点-遥感与航拍影像处理识别篇](https://bbs.cvmart.net/topics/2953)<br>
  
*   [15.CVPR 2020 论文大盘点-图像质量评价篇](https://bbs.cvmart.net/topics/2923)<br>
  
*   [14.CVPR 2020 论文大盘点-图像修复 Inpainting 篇](https://bbs.cvmart.net/topics/2903)  <br>
  
*   [13.CVPR 2020 论文大盘点-图像增强与图像恢复篇](https://bbs.cvmart.net/topics/2902)<br>
  
*   [12.CVPR 2020 论文大盘点-去雨去雾去模糊篇](https://bbs.cvmart.net/topics/2876)<br>

*   [11.CVPR 2020 论文大盘点-医学影像处理识别篇](https://bbs.cvmart.net/topics/2855)<br>

*   [10.CVPR 2020 论文大盘点-抠图 Matting 篇](https://bbs.cvmart.net/topics/2854)<br>

*   [9.CVPR 2020 论文大盘点-图像分割完整篇](https://bbs.cvmart.net/topics/2829)<br>

*   [8.CVPR 2020 论文大盘点-全景分割与视频目标分割篇](https://bbs.cvmart.net/topics/2818)<br>

*   [7.CVPR 2020 论文大盘点-超分辨篇](https://bbs.cvmart.net/topics/2725)<br>
    
*   [6.CVPR 2020 论文大盘点-目标检测篇](https://bbs.cvmart.net/topics/2732)<br>
   
*   [5.CVPR 2020 论文大盘点-人脸技术篇](https://bbs.cvmart.net/topics/2720 )<br>
    
*   [4.CVPR 2020 论文大盘点-目标跟踪篇](https://bbs.cvmart.net/topics/2733 )<br>
    
*   [3.CVPR 2020 论文大盘点-文本图像篇](https://bbs.cvmart.net/topics/2778 )<br>
    
*   [2.CVPR 2020 论文大盘点-行人检测与重识别篇](https://bbs.cvmart.net/topics/2751)<br>
    
*   [1.CVPR 2020 论文大盘点-实例分割篇](https://bbs.cvmart.net/topics/2806)<br>


<br>

### 目录<br>
[1. 目标检测](#1)<br>
[2. 人脸识别](#3)<br>
[3. 目标跟踪](#4)<br>
[4. 三维点云/三维重建/三维检测/三维分割/深度估计](#5)<br>
[5. 图像识别](#18)<br>
[6. 图像处理](#6)<br>
[7. 图像分类](#7)<br>
[8. 图像分割](#2)<br>
[9. 姿态估计/动作识别](#8)<br>
[10. 视频分析](#9)<br>
[11. OCR](#10)<br>
[12. GAN](#11)<br>
[13. 小样本/零样本](#12)<br>
[14. 弱监督/无监督/自监督](#13)<br>
[15. 行人跟踪/行人检测/ReID](#14)<br>
[16. 神经网络/模型加速/模型压缩](#15)<br>
[17. 超分辨率](#16)<br>
[18. 视觉常识/数据集/其他](#17)<br>


<br><br>

<a name="1"/>

### 目标检测

1. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection<br>
论文地址：https://arxiv.org/abs/1912.02424   <br>
代码：https://github.com/sfzhang15/ATSS<br><br>

2. Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector<br>
论文地址：https://arxiv.org/abs/1908.01998<br><br>

3. AugFPN: Improving Multi-scale Feature Learning for Object Detection<br>
论文地址：https://arxiv.org/abs/1912.05384<br><br>

4. Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection<br>
论文地址：https://arxiv.org/abs/2003.11818<br>
代码：https://github.com/ggjy/HitDet.pytorch<br><br>

5. Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation<br>
论文地址：https://arxiv.org/abs/2003.08813<br><br>

6. CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection<br>
论文地址：https://arxiv.org/abs/2003.09119<br>
代码：https://github.com/KiveeDong/CentripetalNet<br><br>


<br><br>

<a name="3"/>

### 人脸识别

1. Towards Universal Representation Learning for Deep Face Recognition<br>
论文地址：https://arxiv.org/abs/2002.11841<br><br>

2. Suppressing Uncertainties for Large-Scale Facial Expression Recognition   <br>    
论文地址：https://arxiv.org/abs/2002.10392<br>
代码：https://github.com/kaiwang960112/Self-Cure-Network<br><br>

3. Face X-ray for More General Face Forgery Detection<br>
论文地址：https://arxiv.org/pdf/1912.13458.pdf<br><br>

4. Pose Agnostic Cross-spectral Hallucination via Disentangling Independent Factors<br>
论文地址：https://arxiv.org/abs/1909.04365<br><br>

5. Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing<br>
论文地址：https://arxiv.org/abs/2003.08061<br>
代码：https://github.com/clks-wzz/FAS-SGTD<br><br>

6. Learning Meta Face Recognition in Unseen Domains<br>
论文地址：https://arxiv.org/abs/2003.07733<br>
代码：https://github.com/cleardusk/MFR<br><br>


<br><br>

<a name="4"/>

### 目标跟踪

1. ROAM: Recurrently Optimizing Tracking Model<br>
论文地址：https://arxiv.org/abs/1907.12006 <br><br>

<br><br>

<a name="5"/>

### 三维点云/三维重建/三维检测/三维分割/深度估计

* 三维点云&重建

1. PF-Net: Point Fractal Network for 3D Point Cloud Completion<br>
论文地址：https://arxiv.org/abs/2003.00410<br><br>

2. PointAugment: an Auto-Augmentation Framework for Point Cloud Classification<br>
论文地址：https://arxiv.org/abs/2002.10876<br>
代码：https://github.com/liruihui/PointAugment/<br><br>

3. Learning multiview 3D point cloud registration<br>
论文地址：https://arxiv.org/abs/2001.05119<br><br>

4. C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds<br>
论文地址：https://arxiv.org/abs/1912.07009<br><br>

5. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds<br>
论文地址：https://arxiv.org/abs/1911.11236 <br><br>


6. Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image<br>
论文地址：https://arxiv.org/abs/2002.12212<br><br>

7. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion<br>
论文地址：https://arxiv.org/abs/2003.01456<br><br>

8. In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction from 2D Landmarks<br>
论文地址：https://arxiv.org/pdf/1911.11924.pdf<br><br>

9. Attentive Context Normalization for Robust Permutation-Equivariant Learning<br>
论文地址：https://arxiv.org/abs/1907.02545	Weiwei Sun, Wei Jiang, Eduard Trulls, Andrea Tagliasacchi, Kwang Moo Yi<br><br>

10. PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes	<br>
论文地址：https://arxiv.org/abs/1911.10949	<br><br>

11. SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans<br>
论文地址：https://arxiv.org/abs/1912.00036<br><br>

12. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching<br>
论文地址：https://arxiv.org/abs/1912.06378<br>
代码：https://github.com/alibaba/cascade-stereo<br><br>

13. Unsupervised Learning of Intrinsic Structural Representation Points<br>
论文地址：https://arxiv.org/abs/2003.01661<br>
代码：https://github.com/NolenChen/3DStructurePoints<br><br>

* 三维重建
1. Leveraging 2D Data to Learn Textured 3D Mesh Generation <br>	
论文地址：https://arxiv.org/abs/2004.04180<br><br>

2. ARCH: Animatable Reconstruction of Clothed Humans<br>	
论文地址：https://arxiv.org/abs/2004.04572<br><br>

3. Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions<br>	
论文地址：https://arxiv.org/abs/2004.03967<br><br>

<br><br>

<a name="18"/>

### 图像识别

* 图像特征匹配
1. Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task <br>
论文地址：https://arxiv.org/abs/1912.00623<br><br>

2. Correspondence Networks with Adaptive Neighbourhood Consensus <br>
论文地址：https://arxiv.org/abs/2003.12059<br><br>

* 图像字幕
10. Normalized and Geometry-Aware Self-Attention Network for Image Captioning <br>
论文地址：https://arxiv.org/abs/2003.08897<br><br>

<br><br>

<a name="6"/>

### 图像处理

1. Learning to Shade Hand-drawn Sketches<br>
论文地址：https://arxiv.org/abs/2002.11812<br><br>

2. Single Image Reflection Removal through Cascaded Refinement<br>
论文地址：https://arxiv.org/abs/1911.06634<br><br>

3. Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data<br>
论文地址：https://arxiv.org/abs/2002.11297<br><br>

4. Deep Image Harmonization via Domain Verification<br>
论文地址：https://arxiv.org/abs/1911.13239<br>
代码：https://github.com/bcmi/Image_Harmonization_Datasets<br><br>

5. RoutedFusion: Learning Real-time Depth Map Fusion<br>
论文地址：https://arxiv.org/pdf/2001.04388.pdf <br><br>

6. Neural Contours: Learning to Draw Lines from 3D Shapes<br>
论文地址：https://arxiv.org/abs/2003.10333<br><br>

7. Towards Photo-Realistic Virtual Try-On by Adaptively Generating鈫Preserving Image Content<br>
论文地址：https://arxiv.org/abs/2003.05863<br><br>



<br><br>

<a name="7"/>

### 图像分类

1. Self-training with Noisy Student improves ImageNet classification<br>
论文地址：https://arxiv.org/abs/1911.04252<br><br>

2. Image Matching across Wide Baselines: From Paper to Practice<br>
论文地址：https://arxiv.org/abs/2003.01587<br><br>

3. Towards Robust Image Classification Using Sequential Attention Models<br>
论文地址：https://arxiv.org/abs/1912.02184<br><br>

4. Learning in the Frequency Domain	<br>
论文地址：https://arxiv.org/abs/2002.12416<br><br>

5. Learning from Web Data with Memory Module	<br>
论文地址：https://arxiv.org/abs/1906.12028<br><br>

6. Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks	<br>
论文地址：https://arxiv.org/abs/1912.09393 <br><br>


<br><br>

<a name="2"/>

### 图像分割

1. Semi-Supervised Semantic Image Segmentation with Self-correcting Networks<br>
论文地址：https://arxiv.org/abs/1811.07073<br><br>

2. Deep Snake for Real-Time Instance Segmentation<br>
论文地址：https://arxiv.org/abs/2001.01629<br><br>

3. CenterMask : Real-Time Anchor-Free Instance Segmentation<br>
论文地址：https://arxiv.org/abs/1911.06667<br>
代码：https://github.com/youngwanLEE/CenterMask<br><br>

4. SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks<br>
论文地址：https://arxiv.org/abs/2003.00678<br><br>

5. PolarMask: Single Shot Instance Segmentation with Polar Representation<br>
论文地址：https://arxiv.org/abs/1909.13226<br>
代码：https://github.com/xieenze/PolarMask<br><br>


6. xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation<br>
论文地址：https://arxiv.org/abs/1911.12676<br><br>

7. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation<br>
论文地址：https://arxiv.org/abs/2001.00309<br><br>

8. Enhancing Generic Segmentation with Learned Region Representations<br>
论文地址：https://arxiv.org/abs/1911.08564<br><br>

<br><br>

<a name="8"/>

### 姿态估计/动作识别

1. VIBE: Video Inference for Human Body Pose and Shape Estimation<br>
论文地址：https://arxiv.org/abs/1912.05656  <br> 
代码：https://github.com/mkocabas/VIBE<br><br>

2. Distribution-Aware Coordinate Representation for Human Pose Estimation<br>
论文地址：https://arxiv.org/abs/1910.06278   <br>
代码：https://github.com/ilovepose/DarkPose<br><br>

3. 4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras<br>
论文地址：https://arxiv.org/abs/2002.12625<br><br>

4. Optimal least-squares solution to the hand-eye calibration problem<br>
论文地址：https://arxiv.org/abs/2002.10838<br><br>

5. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry<br>
论文地址：https://arxiv.org/abs/2003.01060<br><br>

6. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition<br>
论文地址：https://arxiv.org/abs/2001.09691<br><br>

7. Distribution Aware Coordinate Representation for Human Pose Estimation<br>
论文地址：https://arxiv.org/abs/1910.06278<br><br>

8. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation<br>
论文地址：https://arxiv.org/abs/1911.07524<br><br>

9. PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation<br>
论文地址：https://arxiv.org/abs/1911.04231<br><br>

10. Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation<br>
论文地址：https://arxiv.org/abs/2003.02824<br><br>

11. G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features<br>
论文地址：https://arxiv.org/abs/2003.11089<br><br>

12. Deep Image Spatial Transformation for Person Image Generation<br>
论文地址：https://arxiv.org/abs/2003.00696<br>
代码：https://github.com/RenYurui/ Global-Flow-Local-Attention<br><br>


<br><br>

<a name="9"/>

### 视频分析

1. Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications<br>
论文地址：https://arxiv.org/abs/2003.01455   <br>
代码：https://github.com/bbrattoli/ZeroShotVideoClassification<br><br>

2. Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs<br>
论文地址：https://arxiv.org/abs/2003.00387<br><br>

3. Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning<br>
论文地址：https://arxiv.org/abs/2003.00392<br><br>

4. Object Relational Graph with Teacher-Recommended Learning for Video Captioning<br>
论文地址：https://arxiv.org/abs/2002.11566<br><br>

5. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution<br>
论文地址：https://arxiv.org/abs/2002.11616<br><br>

6. Blurry Video Frame Interpolation<br>
论文地址：https://arxiv.org/abs/2002.12259<br><br>

7. Hierarchical Conditional Relation Networks for Video Question Answering<br>
论文地址：https://arxiv.org/abs/2002.10698   <br><br>

8. Action Modifiers:Learning from Adverbs in Instructional Video<br>
论文地址：https://arxiv.org/abs/1912.06617     <br><br>

9. Visual Grounding in Video for Unsupervised Word Translation<br>
论文地址：https://arxiv.org/abs/2003.05078<br>
代码：https://github.com/gsig/visual-grounding<br><br>

10. MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask（视频分析-光流估计）<br>
论文地址：https://arxiv.org/abs/2003.10955<br>
代码：https://github.com/microsoft/MaskFlownet<br><br>

11. Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects（视频预测）<br>
论文地址：https://arxiv.org/abs/2003.12045<br>
代码：https://ehsanik.github.io/forcecvpr2020<br><br>


<br><br>

<a name="10"/>

### OCR

1. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network<br>
论文地址：https://arxiv.org/abs/2002.10200<br>
代码：https://github.com/Yuliang-Liu/bezier_curve_text_spotting,https://github.com/aim-uofa/adet<br><br>

2. Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA<br>
论文地址：https://arxiv.org/abs/1911.06258<br><br>

<br><br>

<a name="11"/>

### GAN

1. Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models<br>
论文地址：https://arxiv.org/abs/1911.12287<br>
代码：https://github.com/giannisdaras/ylg<br><br>

2. MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis<br>
论文地址：https://arxiv.org/abs/1903.06048<br><br>

3. Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory<br>
论文地址：https://arxiv.org/abs/1911.04636<br><br>

4. PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer<br>
论文地址：https://arxiv.org/abs/1909.06956<br><br>


<br><br>

<a name="12"/>

### 小样本/零样本

1. Improved Few-Shot Visual Classification<br>
论文地址：https://arxiv.org/pdf/1912.03432.pdf <br><br>

2. Meta-Transfer Learning for Zero-Shot Super-Resolution<br>
论文地址：https://arxiv.org/abs/2002.12213<br><br>

3. Instance Credibility Inference for Few-Shot Learning<br>
论文地址：https://arxiv.org/abs/2003.11853<br>
代码：https://github.com/Yikai-Wang/ICI-FSL<br><br>

<br><br>

<a name="13"/>

### 弱监督/无监督/自监督

1. Rethinking the Route Towards Weakly Supervised Object Localization<br>
论文地址：https://arxiv.org/abs/2002.11359<br><br>

2. NestedVAE: Isolating Common Factors via Weak Supervision<br>
论文地址：https://arxiv.org/abs/2002.11576 <br><br>

3. Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation<br>
论文地址：https://arxiv.org/abs/1911.07450<br><br>

4. Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction<br>
论文地址：https://arxiv.org/abs/2003.01460<br><br>

5. ClusterFit: Improving Generalization of Visual Representations<br>
论文地址：https://arxiv.org/abs/1912.03330<br><br>

6. Auto-Encoding Twin-Bottleneck Hashing<br>
论文地址：https://arxiv.org/abs/2002.11930<br><br>

7. Learning Representations by Predicting Bags of Visual Words<br>
论文地址：https://arxiv.org/abs/2002.12247<br><br>

8. A Characteristic Function Approach to Deep Implicit Generative Modeling<br>
论文地址：https://arxiv.org/abs/1909.07425<br><br>

9. Unsupervised Learning of Intrinsic Structural Representation Points<br>
论文地址：https://arxiv.org/abs/2003.01661<br>
代码：https://github.com/NolenChen/3DStructurePoints<br><br>


<br><br>

<a name="14"/>

### 行人跟踪/行人检测/ReID
1. Cross-modality Person re-identification with Shared-Specific Feature Transfer	<br>
论文地址：https://arxiv.org/abs/2002.12489 <br><br>

2. Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction	<br>
论文地址：https://arxiv.org/abs/2002.11927<br><br>

3. The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction	<br>
论文地址：https://arxiv.org/abs/1912.06445<br><br>

<br><br>

<a name="15"/>

### 神经网络/模型压缩/模型加速

1. GhostNet: More Features from Cheap Operations<br>
论文地址：https://arxiv.org/abs/1911.11907<br>
代码：https://github.com/iamhankai/ghostnet<br><br>

2. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral <br>
论文地址：https://arxiv.org/abs/2003.01826<br><br>

3. GPU-Accelerated Mobile Multi-view Style Transfer<br>
论文地址：https://arxiv.org/abs/2003.00706<br><br>

4. Bundle Adjustment on a Graph Processor		<br>
论文地址：https://arxiv.org/abs/2003.03134		<br>
代码：https://github.com/joeaortiz/gbp<br><br>

5. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral 	<br>
论文地址：https://arxiv.org/abs/2003.01826	<br><br>

6. Holistically-Attracted Wireframe Parsing	<br>
论文地址：https://arxiv.org/abs/2003.01663	<br><br>

7. AdderNet: Do We Really Need Multiplications in Deep Learning? 	<br>
论文地址：https://arxiv.org/abs/1912.13200 	<br><br>

8. CARS: Contunuous Evolution for Efficient Neural Architecture Search	<br>
论文地址：https://arxiv.org/abs/1909.04977	 	<br>
代码：https://github.com/huawei-noah/CARS<br><br>

9. Π-nets: Deep Polynomial Neural Networksv<br>
论文地址：https://arxiv.org/abs/2003.03828<br><br>

10. Explaining Knowledge Distillation by Quantifying the Knowledge<br>
论文地址：https://arxiv.org/abs/2003.03622<br><br>

<br><br>

<a name="16"/>

### 超分辨率

1. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution	<br>
论文地址：https://arxiv.org/abs/2002.11616<br><br>

2. Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution <br>
论文地址：https://arxiv.org/abs/2003.07018<br>
代码：https://github.com/guoyongcs/DRN<br><br>

<br><br>

<a name="17"/>

### 视觉常识/其他

1. Visual Commonsense R-CNN<br>
论文地址：https://arxiv.org/abs/2002.12204<br>
代码：https://github.com/Wangt-CN/VC-R-CNN<br><br>

2. Scalable Uncertainty for Computer Vision with Functional Variational Inference	<br>
论文地址：https://arxiv.org/abs/2003.03396<br><br>

3. Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective	<br>
论文地址：https://arxiv.org/abs/2002.10826<br><br>

4. Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs	<br>
论文地址：https://arxiv.org/abs/2003.00287	<br><br>
				
5. Filter Grafting for Deep Neural Networks	<br>
论文地址：https://arxiv.org/abs/2001.05868<br>
代码：https://github.com/fxmeng/filter-grafting.git<br><br>

6. 12-in-1: Multi-Task Vision and Language Representation Learning<br>
论文地址：https://arxiv.org/abs/1912.02315 	<br><br>
				
7. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training	<br>
论文地址：https://arxiv.org/abs/2002.10638		<br>
代码：https://github.com/weituo12321/PREVALENT<br><br>

8. Unbiased Scene Graph Generation from Biased Training	<br>
论文地址：https://arxiv.org/abs/2002.11949 <br><br>

9.Towards Visually Explaining Variational Autoencoders<br>
论文地址：https://arxiv.org/abs/1911.07389<br><br>

10. BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition<br>
论文地址：http://www.weixiushen.com/publication/cvpr20_BBN.pdf<br>
代码：https://github.com/Megvii-Nanjing/BBN<br><br>

11. High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks<br>
论文地址：https://arxiv.org/abs/1905.13545<br><br>

12. SAM: The Sensitivity of Attribution Methods to Hyperparameters<br>
论文地址：http://s.anhnguyen.me/sam\_cvpr2020.pdf<br>
代码：https://github.com/anguyen8/sam<br><br>

13. Π− nets: Deep Polynomial Neural Networks<br>
论文地址：https://arxiv.org/abs/2003.03828<br><br>

14. Towards Backward-Compatible Representation Learning<br>
论文地址：https://arxiv.org/abs/2003.11942<br><br>

15. On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location<br>
论文地址：https://arxiv.org/abs/2003.07064<br><br>

16. KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations（数据集）<br>
论文地址：https://arxiv.org/abs/2002.12687<br><br>


<br><br>

<a name="101"/>

# 2.CVPR2020 Oral（持续更新）<br>
[1. PolarMask: Single Shot Instance Segmentation with Polar Representation](https://arxiv.org/abs/1909.13226)<br>
代码：https://github.com/xieenze/PolarMask <br><br>

[2. Unbiased Scene Graph Generation from Biased Training](https://arxiv.org/abs/2002.11949) <br>
代码：https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch <br><br>

[3. Learning to Shade Hand-drawn Sketches](https://arxiv.org/abs/2002.11812) <br>
代码：https://github.com/qyzdao/ShadeSketch <br><br>

[4. SAM: The Sensitivity of Attribution Methods to Hyperparameters](http://s.anhnguyen.me/sam_cvpr2020.pdf)<br>
代码：https://github.com/anguyen8/sam<br><br>

[5. High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks](https://arxiv.org/abs/1905.13545)<br><br>

[6. Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task](https://arxiv.org/abs/1912.00623) <br><br>

[7. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds](https://arxiv.org/abs/1911.11236)<br><br>

[8. AdderNet: Do We Really Need Multiplications in Deep Learning? ](https://arxiv.org/abs/1912.13200 )<br><br>

[9. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition](https://arxiv.org/abs/2001.09691 )<br><br>

[10. Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation](https://arxiv.org/abs/2003.08813)<br><br>

[11. Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing](https://arxiv.org/abs/2003.08061)<br>
https://github.com/clks-wzz/FAS-SGTD<br><br>

[12. Learning Meta Face Recognition in Unseen Domains](https://arxiv.org/abs/2003.07733)<br>
https://github.com/cleardusk/MFR<br><br>

[13. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching](https://arxiv.org/abs/1912.06378)<br>
https://github.com/alibaba/cascade-stereo<br><br>

[14. BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition](http://www.weixiushen.com/publication/cvpr20_BBN.pdf)<br>
https://github.com/Megvii-Nanjing/BBN<br><br>

[15. High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks](https://arxiv.org/abs/1905.13545)<br><br>

[16. SAM: The Sensitivity of Attribution Methods to Hyperparameters](http://s.anhnguyen.me/sam\_cvpr2020.pdf)<br>
https://github.com/anguyen8/sam<br><br>

[17. Towards Backward-Compatible Representation Learning](https://arxiv.org/abs/2003.11942)<br><br>

[18. MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask](https://arxiv.org/abs/2003.10955)<br>
https://github.com/microsoft/MaskFlownet<br><br>

[19. Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects](https://arxiv.org/abs/2003.12045)<br>
https://ehsanik.github.io/forcecvpr2020<br><br>

[20. StyleRig: Rigging StyleGAN for 3D Control over Portrait Images](https://arxiv.org/abs/2004.00121)<br><br>

[21. Conditional Channel Gated Networks for Task-Aware Continual Learning](https://arxiv.org/abs/2004.00070)

[22. BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation](https://arxiv.org/abs/2003.14031)<br><br>

[23. TITAN: Future Forecast using Action Priors](https://arxiv.org/abs/2003.13886)<br><br>

[24. Learning Interactions and Relationships between Movie Characters](https://arxiv.org/abs/2003.13158)<br><br>

[25. GPS-Net: Graph Property Sensing Network for Scene Graph Generation](https://arxiv.org/abs/2003.12962)<br>
https://github.com/taksau/GPS-Net<br><br>

[26. A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising](https://arxiv.org/abs/2003.12751)<br>
https://github.com/Vandermode/NoiseModel<br><br>

[27. Controllable Person Image Synthesis with Attribute-Decomposed GAN](https://arxiv.org/abs/2003.12267)<br>
https://menyifang.github.io/projects/ADGAN/ADGAN.html<br><br>

[28. Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations](https://arxiv.org/abs/2003.12237)<br><br>

[29. Learning to Optimize Non-Rigid Tracking](https://arxiv.org/abs/2003.12230)<br><br>

[30. Self-Supervised Scene De-occlusion](https://arxiv.org/abs/2004.02788)<br>
https://xiaohangzhan.github.io/projects/deocclusion/<br><br>

[31. Robust 3D Self-portraits in Seconds](https://arxiv.org/abs/2004.02460)<br><br>

[32. Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics](https://arxiv.org/abs/2004.02331)<br><br>

[33. Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization](https://arxiv.org/abs/2004.02215)<br><br>

[34. Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval](https://arxiv.org/abs/2004.01804)<br><br>

[35. Deep White-Balance Editing](https://arxiv.org/abs/2004.01354)<br><br>

[36. Tracking by Instance Detection: A Meta-Learning Approach](https://arxiv.org/abs/2004.00830)<br><br>



<br><br>

<a name="102"/>

# 3.CVPR2020 论文解读<br><br>


### [15.无监督的视觉常识特征学习——因果关系上的一点探索](https://zhuanlan.zhihu.com/p/111306353)<br>
如今越来越多的研究者开始关注如何将统计中的因果应用于deep learning，来增加其鲁棒性、可解释性等等。但是大部分工作都没有深入因果理论中，更多的是借用了其中一些概念（比如counterfactual反事实），这篇paper旨在能在此基础上再向前走一点。<br>
论文链接：https://arxiv.org/abs/2002.12204<br>
论文代码：https://github.com/Wangt-CN/VC-R-CNN<br><br>

### [14.CVPR2020 | 最新最完善的场景图生成 (SGG)开源框架，集成目前最全metrics，已开源](https://mp.weixin.qq.com/s/Nj6GjpRG8qG1ihhcoY9SwQ)<br>
选择2019年热门框架facebookresearch/maskrcnn-benchmark作为基础，在其基础上搭建了Scene-Graph-Benchmark.pytorch。该代码不仅兼容了maskrcnn-benchmark所支持的所有detector模型，且得益于facebookresearch优秀的代码功底，更大大增加了SGG部分的可读性和可操作性。<br>
论文链接：https://arxiv.org/abs/2002.11949<br>
论文代码：https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch<br><br>


### [13.CVPR2020 | 旷视研究院提出基于3D关键点投票网络的单目6DoF位姿估计算法(已开源)](https://mp.weixin.qq.com/s/c8rQYj5lSOtI1iza9e0Dpw)<br>
论文链接：https://arxiv.org/abs/1911.04231<br>
论文代码：https://github.com/ethnhe/PVN3D.git<br>
旷视研究院提出一种基于霍夫投票（Hough voting)的 3D 关键点检测神经网络，称之为 PVN3D，以学习逐点到 3D 关键点的偏移并为 3D 关键点投票。把基于 2D 关键点的方法推进至 3D 关键点，以充分利用刚体的几何约束信息，极大提升了 6DoF 估计的精确性。在 YCB-Video 和 LineMOD 两大公开数据集上进行了评估实验，结果表明该方法以大幅优势取得了当前最佳性能。
<br><br>


### [12.跨模态行人重识别：共享与特异特征变换算法cm-SSFT](https://mp.weixin.qq.com/s/qPc71o2JeMDpDgRxDtp2BA)<br>
论文链接：https://arxiv.org/abs/2002.12489<br>
关注红外线-RGB跨模态行人重识别。试图解决：以往大部分跨模态行人重识别算法一般都只关注shared feature learning，而很少关注Specific feature。因为Specific feature在对面模态中是不存在的。例如在红外线图片中是没有彩色颜色信息的。反之在彩图中也不会有热度信息。而实际上做过ReID的都知道，传统ReID之所以性能很高，很大程度上就是有些“过拟合”到了这些specific信息上。比如衣服颜色一直是传统ReID的一个重要的cue。从这个角度出发，尝试利用specific特征。主要思路是利用近邻信息：给定一红外线query。当搜索彩色target时，可以先找到一些简单的置信度高的彩色样本（这些样本大概率是红外线query的positive样本），把这些彩色样本的颜色特异特征给与红外线query。做了这件事后，红外线query样本可以利用这些彩色信息再去搜索更难的彩色样本。<br><br>

### [11.RandLA-Net:大场景三维点云语义分割新框架（已开源）](https://mp.weixin.qq.com/s/xuLJ8m_ipGVBXVduA7Y0IA)<br>
论文链接：https://arxiv.org/abs/1911.11236<br>
代码：https://github.com/QingyongHu/RandLA-Net<br>
提出了一种基于简单高效的随机降采样和局部特征聚合的网络结构(RandLA-Net)。该方法不仅在诸如Semantic3D和SemanticKITTI等大场景点云分割数据集上取得了非常好的效果，并且具有非常高的效率(e.g. 比基于图的方法SPG快了接近200倍)。<br><br>


### [10.腾讯推出超强少样本目标检测算法，公开千类少样本检测训练集FSOD](https://mp.weixin.qq.com/s/TRRsBGzMir0ttzjTdXwJCw)<br>
论文链接：https://arxiv.org/abs/1908.01998<br>
提出了新的少样本目标检测算法，创新点包括Attention-RPN、多关系检测器以及对比训练策略，另外还构建了包含1000类的少样本检测数据集FSOD，在FSOD上训练得到的论文模型能够直接迁移到新类别的检测中，不需要fine-tune<br><br>

### [9.CARS: 华为提出基于进化算法和权值共享的神经网络结构搜索，CIFAR-10上仅需单卡半天](https://mp.weixin.qq.com/s/GAL-hbERLp6vS2zB_I9jxg)
<br>
论文链接：https://arxiv.org/abs/1909.04977<br>
为了优化进化算法在神经网络结构搜索时候选网络训练过长的问题，参考ENAS和NSGA-III，论文提出连续进化结构搜索方法(continuous evolution architecture search, CARS)，最大化利用学习到的知识，如上一轮进化的结构和参数。首先构造用于参数共享的超网，从超网中产生子网，然后使用None-dominated排序策略来选择不同大小的优秀网络，整体耗时仅需要0.5 GPU day。<br><br>

### [8.化繁为简，弱监督目标定位领域的新SOTA - 伪监督目标定位方法(PSOL)](https://mp.weixin.qq.com/s/6G7BG8DrKZ0Zvi-BUqg78w)<br>

论文链接：https://arxiv.org/abs/2002.11359<br>
论文提出伪监督目标定位方法(PSOL)来解决目前弱监督目标定位方法的问题，该方法将定位与分类分开成两个独立的网络，然后在训练集上使用Deep descriptor transformation(DDT)生成伪GT进行训练，整体效果达到SOTA。 该论文主要有三点贡献：一、弱监督目标定位应该分为类不可知目标定位和目标分类两个独立的部分，提出PSOL算法；二、尽管生成的bbox有偏差，论文仍然认为应该直接优化他们而不需要类标签，最终达到SOTA；三、在不同的数据集上，PSOL算法不需要fine-tuning也能有很好的定位迁移能力<br><br>


### [7.字节跳动：基于解剖学感知的视频3D人体姿态估计](https://mp.weixin.qq.com/s/ut8CmEZPc3NMDdlgXfUzGg)<br>

论文链接：https://arxiv.org/pdf/2002.10322.pdf<br>
在这项工作中，我们提出了一种新的视频中3D人体姿态估计的解决方案。我们不是直接回归3D关节位置，而是从人体骨骼解剖中汲取灵感，将任务分解为骨骼方向预测和骨骼长度预测，从这两个预测中完全可以得到三维关节位置。我们的研究动机是人类骨骼的长度随着时间的推移保持一致。这推动了我们开发有效的技术来利用视频中所有帧的全局信息来进行高精度的骨骼长度预测。此外，对于骨骼方向预测网络，我们提出了一种具有长跳跃连接的全卷积传播结构。本质上，它分层地预测不同骨骼的方向，而不使用任何耗时的存储单元(例如LSTM)。进一步引入了一种新的关节位移损失来连接骨骼长度和骨骼方向预测网络的训练。最后，我们采用一种隐含的注意机制将2D关键点可见性分数作为额外的指导反馈到模型中，这显著地缓解了许多具有挑战性的姿势中的深度歧义。我们的完整模型在Human3.6M和MPI-INF-3dHP数据集上的表现优于之前的最好结果，在这些数据集上的综合评估验证了我们模型的有效性。<br><br>


### [6.微软亚洲研究院：给Deepfake 假脸做 X-Ray，新模型把换脸图打回原形](https://mp.weixin.qq.com/s/DLxqGFm6IRffPa8A0XBc4w)<br>

论文链接：论文地址：https://arxiv.org/pdf/1912.13458.pdf<br>
微软亚洲研究院提出了一个方法，它既不需要了解换脸后的图像数据，也不需要知道换脸算法，就能对图像做『X-Ray』，鉴别出是否换脸，以及指出换脸的边界。
新模型 Face X-Ray 具有两大属性：能泛化到未知换脸算法、能提供可解释的换脸边界。要获得这样的优良属性，诀窍就藏在换脸算法的一般过程中。如下所示，大多数换脸算法可以分为检测、修改以及融合三部分。与之前的研究不同，Face X-Ray 希望检测第三阶段产生的误差。<br><br>

### [5.UDP：人体姿态估计中的无偏数据处理方法](https://mp.weixin.qq.com/s/J1Y0tSIpfTOZ4J-9PPyhag)<br>

论文链接：https://arxiv.org/abs/1911.07524<br>
UDP，解决了现有的SOTA人体姿态估计算法中标准编解码方法存在较大统计误差的问题。同时解决了由于翻转测试而导致的结果不对齐问题。且该算法即用即插，在基本不增加模型复杂度的情况下，有效提升了算法性能。<br><br>

### [4.让合成图像更真实，上交大提出基于域验证的图像和谐化](https://mp.weixin.qq.com/s/JgQ7bgc_bfgWE-PmJMKtOA)<br>

论文链接：https://arxiv.org/abs/1911.13239<br>
在合成图中，前景和背景是在不同的拍摄条件 (比如时刻、季节、光照、天气) 下拍摄的，所以在亮度色泽等方面存在明显的不匹配问题。图像和谐化 (image harmonization) 旨在调整合成图中的前景，使其与背景和谐。传统的图像和谐化方法一般是从背景或者其他图片转移颜色信息到前景上，但这样无法保证调整之后的前景看起来真实并且与背景和谐。近年来，已经有少量的工作尝试用深度学习做图像和谐化，但成对的合成图和真实图极难获得。如果没有成对的合成图和真实图，深度学习的训练过程缺乏足够强的监督信息，合成图和谐化之后的结果也没有 ground-truth 用于评测。截至目前还没有公开的大规模图像和谐化数据库，我们**构建并公布了由四个子数据库组成的图像和谐化数据库。并且，我们提出了域验证 (domain verification) 的概念，尝试了基于域验证的图像和谐化算法。**<br><br>

### [3.PolarMask: 一阶段实例分割新思路](https://zhuanlan.zhihu.com/p/84890413)<br>

论文链接：https://arxiv.org/abs/1909.13226<br>
PolarMask基于FCOS，把实例分割统一到了FCN的框架下。FCOS本质上是一种FCN的dense prediction的检测框架，可以在性能上不输anchor based的目标检测方法，让行业看到了anchor free方法的潜力。接下来要解决的问题是实例分割。本工作最大的贡献在于把更复杂的实例分割问题，转化成在网络设计和计算量复杂度上和物体检测一样复杂的任务，把对实例分割的建模变得简单和高效。<br><br>

### [2.华为GhostNet，超越谷歌MobileNet，已开源](https://mp.weixin.qq.com/s/Wg_BQpo_3K_fumeelDvUxA)<br>

论文链接：https://arxiv.org/abs/1911.11907<br>
该论文提供了一个全新的Ghost模块，旨在通过廉价操作生成更多的特征图。基于一组原始的特征图，作者应用一系列线性变换，以很小的代价生成许多能从原始特征发掘所需信息的“幻影”特征图（Ghost feature maps）。该Ghost模块即插即用，通过堆叠Ghost模块得出Ghost bottleneck，进而搭建轻量级神经网络——GhostNet。在ImageNet分类任务，GhostNet在相似计算量情况下Top-1正确率达75.7%，高于MobileNetV3的75.2%。<br><br>

### [1.加州理工大学Devi Parikh：多任务视觉和语言表示学习](https://mp.weixin.qq.com/s/8CvUT9JvnysIXay7vyY16w)<br>

论文链接：https://arxiv.org/abs/1912.02315<br>
许多视觉和语言的研究集中在一组小而多样的独立任务和支持的数据集上，这些数据集通常是单独研究的;然而，成功完成这些任务所需的视觉语言理解技能有很大的重叠。在这项工作中，我们通过开发一个大规模的、多任务的训练机制来研究视觉和语言任务之间的关系。<br><br>


<br><br>

<a name="103"/>

# 4.To do list<br>
* CVPR2020复现代码及时更新<br>
* CVPR2020论文分享跟进<br>

<br><br>

<a name="104"/>

# 5.Related links<br>
* [CVPR2019/2018/2017最全资料下载（论文／代码等)](https://github.com/extreme-assistant/cvpr2020/blob/master/README.md)<br>
* https://github.com/extreme-assistant/iccv2019<br><br>


# 6.CVPR2020 contributors Wechat Group<br>
为了让大家更好得进行交流，极市特别组建了贡献者群及作者微信群，欢迎加小助手微信（cv-mart，备注CVPR2020）进群。


================================================
FILE: CVPR2021.md
================================================
* 推荐阅读：<br>
  * [ICCV2021/2019/2017 论文/代码/解读/直播合集](https://github.com/extreme-assistant/ICCV2021-Paper-Code-Interpretation)
  * [2020-2021年计算机视觉综述论文汇总](https://github.com/extreme-assistant/survey-computer-vision)
  * [国内外优秀的计算机视觉团队汇总](https://github.com/extreme-assistant/Awesome-CV-Team)
------

# CVPR2021最新信息及论文下载（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）

官网链接：http://cvpr2021.thecvf.com<br>
时间：2021年6月19日-6月25日<br>
论文接收公布时间：2021年2月28日<br>

相关问题：<br>

* [如何评价CVPR 2021的论文接收结果？](https://www.zhihu.com/question/446299297)<br>
* [CVPR 2021接收结果出炉！录用1663篇，接受率提升，你的论文中了吗？（附论文下载）](https://mp.weixin.qq.com/s/4UQ2W1V-eLnL02L8BDOtMg)

<br><br>

# 目录

[1. CVPR2021接受论文/代码分方向汇总（更新中）](#1)<br>
[2. CVPR2021 Oral（更新中）](#2)<br>
[3. CVPR2021论文解读汇总（更新中）](#3)<br>
[4. CVPR2021 Workshop](#4)<br>
[5. To do list](#4)<br>


<br>

<a name="1"/> 

# 1.CVPR2021接受论文/代码分方向整理(持续更新)


## 分类目录：

### [1. 检测](#detection)
* [2D目标检测(2D Object Detection)](#IOD)
  * [一文看尽CVPR2021 2D 目标检测论文（27篇）](https://bbs.cvmart.net/articles/5120)
* [视频目标检测(Video Object Detection)](#VOD)
* [3D目标检测(3D Object Detection)](#3DOD)
* [人物交互检测(HOI Detection)](#HOI)
* [伪装目标检测(Camouflaged Object Detection)](#COD)
* [旋转目标检测(Rotation Object Detection)](#ROD)
* [显著性目标检测(Saliency Object Detection)](#SOD)
* [图像异常检测(Anomally Detection in Image)](#ADI)
  * [一文看尽CVPR2021 图像异常检测论文（6篇）](https://bbs.cvmart.net/articles/5127)
* [关键点检测(Keypoint Detection)](#KeypointDetection)

### [2. 分割(Segmentation)](#Segmentation)
* [图像分割(Image Segmentation)](#ImageSegmentation)
* [全景分割(Panoptic Segmentation)](#PanopticSegmentation)
* [语义分割(Semantic Segmentation)](#SemanticSegmentation)
* [实例分割(Instance Segmentation)](#InstanceSegmentation)
* [超像素(Superpixel)](#Superpixel)
* [视频目标分割(Video Object Segmentation)](#VOS)
* [抠图(Matting)](#Matting)
* [密集预测(Dense Prediction)](#DensePrediction)

### [3. 图像处理(Image Processing)](#ImageProcessing)

* [超分辨率(Super Resolution)](#SuperResolution)
* [图像复原/图像增强(Image Restoration)](#ImageRestoration)
* [图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)](#ISR)
* [图像去噪/去模糊/去雨去雾(Image Denoising)](#ImageDenoising)
* [图像编辑/修复(Image Edit/Image Inpainting)](#ImageEdit)
* [图像翻译(Image Translation)](#ImageTranslation)
* [图像质量评估(Image Quality Assessment)](#IQA)
* [风格迁移(Style Transfer)](#StyleTransfer)

### [4. 估计(Estimation)](#Estimation)
* [姿态估计(Pose Estimation)](#HumanPoseEstimation)
* [手势估计(Gesture Estimation)](#GestureEstimation)
* [光流/位姿/运动估计(Flow/Pose/Motion Estimation)](#Flow/Pose/MotionEstimation)
* [深度估计(Depth Estimation)](#DepthEstimation)

### [5. 图像&视频检索/理解(Image&Video Retrieval/Video Understanding)](#ImageRetrieval)
* [行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)](#ActionRecognition)
* [行人重识别/检测(Re-Identification/Detection)](#Re-Identification)
* [图像/视频字幕(Image/Video Caption)](#VideoCaption)

### [6. 人脸(Face)](#Face)
* [人脸识别/检测(Facial Recognition/Detection)](#FacialRecognition)
* [人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)](#FaceSynthesis)
* [人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)](#FaceAnti-Spoofing)

### [7. 三维视觉(3D Vision)](#3DVision)
* [点云(Point Cloud)](#3DPC)
* [三维重建(3D Reconstruction)](#3DReconstruction)

### [8. 目标跟踪(Object Tracking)](#ObjectTracking)

### [9. 医学影像(Medical Imaging)](#MedicalImaging)

### [10. 文本检测/识别(Text Detection/Recognition)](#TDR)

### [11. 遥感图像(Remote Sensing Image)](#RSI)

### [12. GAN/生成式/对抗式(GAN/Generative/Adversarial)](#GAN)

### [13. 图像生成/合成(Image Generation/Image Synthesis)](#IGIS)
* [视图合成(View Synthesis)](#ViewSynthesis)

### [14. 场景图(Scene Graph](#SG)
* [场景图生成(Scene Graph Generation)](#SGG)
* [场景图预测(Scene Graph Prediction)](#SGP)
* [场景图理解(Scene Graph Understanding)](#SGU)

### [15. 视觉定位(Visual Localization)](#VisualLocalization)

### [16. 视觉推理/视觉问答(Visual Reasoning/VQA)](#VisualReasoning)

### [17. 图像分类(Image Classification)](#ImageClassification)

### [18. 神经网络结构设计(Neural Network Structure Design)](#NNS)
* [Transformer](#Transformer)
* [图神经网络(GNN)](#GNN)
* [神经网络架构搜索(NAS)](#NAS)

### [19. 模型压缩(Model Compression)](#ModelCompression)
* [知识蒸馏(Knowledge Distillation)](#KnowledgeDistillation)
* [剪枝(Pruning)](#Pruning)
* [量化(Quantization)](#Quantization)

### [20. 模型训练/泛化(Model Training/Generalization)](#ModelTraining)
* [噪声标签(Noisy Label)](#NoisyLabel)
* [长尾分布(Long-Tailed Distribution)](#Long-Tailed)

### [21. 模型评估(Model Evaluation)](#ModelEvaluation)

### [22. 数据处理(Data Processing)](#DataProcessing)
* [数据增广(Data Augmentation)](#DataAugmentation)
* [表征学习(Representation Learning)](#RepresentationLearning)
* [归一化/正则化(Batch Normalization)](#BatchNormalization)
* [图像聚类(Image Clustering)](#ImageClustering)
* [图像压缩(Image Compression)](#ImageCompression)
* [异常检测(Anomaly Detection)](#AnomalyDetection)

### [23. 主动学习(Active Learning)](#ActiveLearning)

### [24. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)](#Few-shotLearning)

### [25. 持续学习(Continual Learning/Life-long Learning)](#ContinualLearning)

### [26. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)](#domain)

### [27. 度量学习(Metric Learning)](#MetricLearning)

### [28. 对比学习(Contrastive Learning)](#ContrastiveLearning)

### [29. 增量学习(Incremental Learning)](#IncrementalLearning)

### [30. 强化学习(Reinforcement Learning)](#RL)

### [31. 元学习(Meta Learning)](#MetaLearning)

### [32. 多模态学习(Multi-Modal Learning)](#MMLearning)
* [视听学习(Audio-visual Learning)](#Audio-VisualLearning)

### [33. 视觉预测(Vision-based Prediction)](#Vision-basedPrediction)

### [34. 数据集(Dataset)](#Dataset)

### [暂无分类](#100)



<br><br>

<a name="detection"/> 

## 检测


<a name="IOD"/> 

### 图像目标检测(2D Object Detection)

[28] Open-Vocabulary Object Detection Using Captions(使用字幕的开放词汇对象检测)<br>
[paper](https://arxiv.org/abs/2011.10678)<br><br>

[27] Improved Handling of Motion Blur in Online Object Detection(改进在线对象检测中运动模糊的处理)<br>
[paper](https://arxiv.org/abs/2011.14448)<br><br>

[26] PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery(PSRR MaxpoolNMS：具有关系恢复的金字塔移位MaxpoolNMS)<br>
[paper](https://arxiv.org/abs/2105.12990)<br><br>

[25] Domain-Specific Suppression for Adaptive Object Detection(领域特定的自适应对象检测抑制)<br>
[paper](https://arxiv.org/abs/2105.03570)<br><br>

[24] Line Segment Detection Using Transformers without Edges(【线段检测】使用没有边缘的Transformer进行线段检测)<br>
[paper](https://arxiv.org/abs/2101.01909)<br><br>

[23] IQDet: Instance-wise Quality Distribution Sampling for Object Detection(IQDet：用于对象检测的按实例进行质量分布采样)<br>
[paper](https://arxiv.org/abs/2104.06936)<br><br>

[22] Adaptive Class Suppression Loss for Long-Tail Object Detection(长尾目标检测的自适应类抑制损失)<br>
[paper](https://arxiv.org/abs/2104.00885) | [code](https://github.com/CASIA-IVA-Lab/ACSL)<br><br>

[21] DAP: Detection-Aware Pre-training with Weak Supervision(具有弱监督的可感知检测的预训练)<br>
[paper](https://arxiv.org/abs/2103.16651)<br><br>

[20] Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection(稠密关系蒸馏与上下文感知聚合用于小样本对象检测)<br>
[paper](https://arxiv.org/abs/2103.17115) ｜ [code](https://github.com/hzhupku/DCNet)<br><br>

[19] Scale-aware Automatic Augmentation for Object Detection(用于物体检测的可感知规模的自动增强)<br>
[paper](https://arxiv.org/abs/2103.17220) | [code](https://github.com/Jia-Research-Lab/SA-AutoAug)<br><br>

[18] Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection(数据不确定性指导的多阶段学习，用于半监督对象检测)<br>
[paper](https://arxiv.org/abs/2103.16368)<br><br>

[17] OTA: Optimal Transport Assignment for Object Detection(OTA：用于对象检测的最佳传输分配)<br>
[paper](https://arxiv.org/abs/2103.14259) | [code](https://github.com/Megvii-BaseDetection/OTA)<br><br>

[16] Distilling Object Detectors via Decoupled Features(通过解耦功能蒸馏物体检测器)<br>
[paper](https://arxiv.org/abs/2103.14475) | [code](https://github.com/ggjy/DeFeat.pytorch)<br><br>

[15] I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors(I ^ 3Net：用于适应一阶段对象检测器的隐式实例不变网络)<br>
[paper](https://arxiv.org/abs/2103.13757)<br><br>

[14] Robust and Accurate Object Detection via Adversarial Learning(通过对抗学习进行稳健而准确的目标检测)<br>
[paper](https://arxiv.org/abs/2103.13886) | [model](https://github.com/google/automl/tree/master/efficientdet/Det-AdvProp.md)<br><br>

[13] You Only Look One-level Feature<br>
[paper](https://arxiv.org/pdf/2103.09460.pdf) | [code](https://github.com/megvii-model/YOLOF)<br>
解读：[我扔掉FPN来做目标检测，效果竟然这么强！YOLOF开源：你只需要看一层特征](https://mp.weixin.qq.com/s/GMHx3oMZr0qlP9Iy8tWGMw)

[12] End-to-End Object Detection with Fully Convolutional Network(使用全卷积网络进行端到端目标检测)<br>
[paper](https://arxiv.org/abs/2012.03544) | [code](https://github.com/Megvii-BaseDetection/DeFCN)<br>
解读：[丢弃Transformer，FCN也可以实现E2E检测](https://zhuanlan.zhihu.com/p/332281368)<br><br>

[11] FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding(通过对比提案编码进行的小样本目标检测)<br>
[paper](https://arxiv.org/abs/2103.05950v2) ｜ [code](https://github.com/MegviiDetection/FSCE)<br><br>

[10] Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection(学习可靠的定位质量估计用于密集目标检测)<br>
[paper](https://arxiv.org/pdf/2011.12885.pdf) | [code](https://github.com/implus/GFocalV2) <br>
解读:[大白话 Generalized Focal Loss V2](https://zhuanlan.zhihu.com/p/313684358)<br><br>

[9] MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection(用于类别识别无监督域自适应对象检测)<br>
[paper](https://arxiv.org/pdf/2103.04224.pdf)<br><br>

[8] OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection(一键式路径聚合网络体系结构搜索对象)<br>
[paper](https://arxiv.org/abs/2103.04507) | [code](https://github.com/VDIGPKU/OPANAS)<br><br>

[7] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(小样本目标检测的语义关系推理)<br>
[paper](https://arxiv.org/abs/2103.01903)<br><br>

[6] General Instance Distillation for Object Detection(通用实例蒸馏技术在目标检测中的应用)<br>
[paper](https://arxiv.org/abs/2103.02340)<br><br>

[5] Instance Localization for Self-supervised Detection Pretraining(自监督检测预训练的实例定位)<br>
[paper](https://arxiv.org/pdf/2102.08318.pdf)｜[code](https://github.com/limbo0000/InstanceLoc)<br><br>

[4] Multiple Instance Active Learning for Object Detection（用于对象检测的多实例主动学习）<br>
[paper](https://github.com/yuantn/MIAL/raw/master/paper.pdf) | [code](https://github.com/yuantn/MIAL)<br>
解读:[MI-AOD: 少量样本实现高检测性能](https://zhuanlan.zhihu.com/p/362764637)<br>

[3] Towards Open World Object Detection(开放世界中的目标检测)<br>
[paper](https://arxiv.org/abs/2103.02603) | [code](https://github.com/JosephKJ/OWOD)<br>
解读:[目标检测一卷到底之后，终于有人为它挖了个新坑｜CVPR2021 Oral](https://mp.weixin.qq.com/s/1_WC_hTTc9fgSJUqBsfTYQ)<br><br>

[2] Positive-Unlabeled Data Purification in the Wild for Object Detection(野外检测对象的阳性无标签数据提纯)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.pdf)<br><br>

[1] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers<br>
[paper](https://arxiv.org/pdf/2011.09094.pdf) | [code](https://github.com/dddzg/up-detr)<br>
解读：[无监督预训练检测器](https://www.zhihu.com/question/432321109/answer/1606004872)<br><br>


<a name="VOD"/> 

### 视频目标检测(Video Object Detection)

[3] Dogfight: Detecting Drones from Drones Videos(从无人机视频中检测无人机)<br>
[paper](https://arxiv.org/abs/2103.17242)<br><br>

[2] Depth from Camera Motion and Object Detection(相机运动和物体检测的深度)<br>
[paper](https://arxiv.org/abs/2103.01468)<br><br>

[1] There is More than Meets the Eye: Self-Supervised Multi-Object Detection  and Tracking with Sound by Distilling Multimodal Knowledge(多模态知识提取的自监督多目标检测与有声跟踪)<br>
[paper](https://arxiv.org/abs/2103.01353) | [video](https://www.youtube.com/channel/UCRpM8k1GY3kD2TqCo_yKN3g) | [project](http://rl.uni-freiburg.de/research/multimodal-distill)<br><br>


<a name="3DOD"/> 

### 三维目标检测(3D object detection)

[15] Exploring intermediate representation for monocular vehicle pose estimation(探索单目车辆姿态估计的中间表示)<br>
[paper](https://arxiv.org/abs/2011.08464) ｜ [code](https://github.com/Nicholasli1995/EgoNet)<br><br>

[14] SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud(SE-SSD：来自点云的自集成单级目标检测器)<br>
[paper](https://arxiv.org/abs/2104.09804) | [code](https://github.com/Vegeta2020/SE-SSD)<br><br>

[13] Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds(点云中基于投票的3D对象检测的回溯代表点)<br>
[paper](https://arxiv.org/abs/2104.06114) | [code](https://github.com/cheng052/BRNet)<br><br>

[12] Objects are Different: Flexible Monocular 3D Object Detection(对象不同：灵活的单眼3D对象检测)<br>
[paper](https://arxiv.org/abs/2104.02323) | [code](https://github.com/zhangyp15/MonoFlex)<br><br>

[11] HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection(HVPR：用于单阶段3D对象检测的混合体素点表示)<br>
[paper](https://arxiv.org/abs/2104.00902)<br><br>

[10] GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection(用于单眼3D对象检测的数学可微分的分组NMS)<br>
[paper](https://arxiv.org/abs/2103.17202) | [code](https://github.com/abhi1kumar/groomed_nms)<br><br>

[9] Delving into Localization Errors for Monocular 3D Object Detection(深入研究单目3D对象检测的定位错误)<br>
[paper](https://arxiv.org/abs/2103.16237) | [code](https://github.com/xinzhuma/monodle)<br><br>

[8] Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection(用于单眼3D对象检测的深度条件动态消息传播)<br>
[paper](https://arxiv.org/abs/2103.16470) | [code](https://github.com/fudan-zvg/DDMP)<br><br>

[7] LiDAR R-CNN: An Efficient and Universal 3D Object Detector(高效且通用的3D对象检测器)<br>
[paper](https://arxiv.org/abs/2103.15297) | [code](https://github.com/tusimple/LiDAR_RCNN)<br><br>

[6] M3DSSD: Monocular 3D Single Stage Object Detector(单眼3D单级目标检测器)<br>
[paper](https://arxiv.org/abs/2103.13164)<br><br>

[5] MonoRUn: Monocular 3D Object Detection by Self-Supervised Reconstruction and Uncertainty Propagation(通过自我监督的重构和不确定性传播进行单眼3D目标检测)<br>
[paper](https://arxiv.org/abs/2103.12605)<br><br>

[4] ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection(ST3D：在三维目标检测上进行无监督域自适应的自训练)<br>
[paper](https://arxiv.org/pdf/2103.05346.pdf) | [code](https://github.com/CVMI-Lab/ST3D)<br><br>

[3] Center-based 3D Object Detection and Tracking(基于中心的3D目标检测和跟踪)<br>
[paper](https://arxiv.org/abs/2006.11275) | [code](https://github.com/tianweiy/CenterPoint)<br><br>

[2] 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection(利用IoU预测进行半监督3D对象检测)<br>
[paper](https://arxiv.org/pdf/2012.04355.pdf) | [code](https://github.com/THU17cyz/3DIoUMatch) | [project](https://thu17cyz.github.io/3DIoUMatch/) | [video](https://youtu.be/nuARjhkQN2U)<br><br>

[1] Categorical Depth Distribution Network for Monocular 3D Object Detection(用于单目三维目标检测的分类深度分布网络)<br>
[paper](https://arxiv.org/abs/2103.01100)<br><br>

<a name="HOI"/> 

### 人物交互检测(HOI Detection)

[7] HOTR: End-to-End Human-Object Interaction Detection with Transformers(HOTR：使用变压器进行端到端的人与对象交互检测)<br>
[paper](https://arxiv.org/abs/2104.13682)<br><br>

[6] Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection(凝视与凝视：推断行动感知点，用于一阶段的人物交互检测)<br>
[paper](https://arxiv.org/abs/2104.05269)<br><br>

[5] Affordance Transfer Learning for Human-Object Interaction Detection(物价转移学习用于人物交互检测)<br>
[paper](https://arxiv.org/abs/2104.02867) | [code](https://github.com/zhihou7/HOI-CL)<br><br>

[4] Detecting Human-Object Interaction via Fabricated Compositional Learning(通过人为构图学习检测人与物体的相互作用)<br>
[paper](https://arxiv.org/abs/2103.08214) | [code](https://github.com/zhihou7/FCL)<br><br>

[3] Reformulating HOI Detection as Adaptive Set Prediction(将人物交互检测重新配置为自适应集预测)<br>
[paper](https://arxiv.org/abs/2103.05983) | [code](https://arxiv.org/abs/2103.05983)<br><br>

[2] QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information(具有图像范围的上下文信息的基于查询的成对人物交互检测)<br>
[paper](https://arxiv.org/abs/2103.05399) | [code](https://github.com/hitachi-rd-cv/qpic)<br><br>

[1] End-to-End Human Object Interaction Detection with HOI Transformer(使用HOI Transformer进行端到端的人类对象交互检测)<br>
[paper](https://arxiv.org/pdf/2103.04503.pdf) | [code](https://github.com/bbepoch/HoiTransformer)<br><br>

<a name="COD"/> 

### 伪装目标检测(Camouflaged Object Detection)

[4] Camouflaged Object Segmentation with Distraction Mining(带有干扰挖掘的伪装对象分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Mei_Camouflaged_Object_Segmentation_With_Distraction_Mining_CVPR_2021_paper.pdf) | [code](https://mhaiyang.github.io/CVPR2021_PFNet/index)<br><br>

[3] Mutual Graph Learning for Camouflaged Object Detection(用于伪装目标检测的互图学习)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhai_Mutual_Graph_Learning_for_Camouflaged_Object_Detection_CVPR_2021_paper.pdf)<br><br>

[2] Uncertainty-aware Joint Salient Object and Camouflaged Object Detection(不确定度联合显着物体和伪装物体检测)<br>
[paper](https://arxiv.org/abs/2104.02628)<br><br>

[1] Simultaneously Localize, Segment and Rank the Camouflaged Objects(同时定位，分割和排序伪装的对象)<br>
[paper](https://arxiv.org/abs/2103.04011) | [code](https://github.com/JingZhang617/COD-Rank-Localize-and-Segment)<br><br>


<a name="ROD"/> 

### 旋转目标检测(Rotation Object Detection)

[2] ReDet: A Rotation-equivariant Detector for Aerial Object Detection(ReDet：用于航空物体检测的等速旋转检测器)<br>
[paper](https://arxiv.org/abs/2103.07733) | [code](https://github.com/csuhan/ReDet)<br><br>

[1] Dense Label Encoding for Boundary Discontinuity Free Rotation Detection(密集标签编码，用于边界不连续自由旋转检测)<br>
[paper](https://arxiv.org/abs/2011.09670) | [code](https://github.com/yangxue0827/RotationDetection) | [解读-DCL：旋转目标检测新方法](https://zhuanlan.zhihu.com/p/354373013)<br><br>

<a name="SOD"/> 

### 显著性检测(Saliency Object Detection)

[4] Calibrated RGB-D Salient Object Detection(校准的 RGB-D 显著物体检测)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ji_Calibrated_RGB-D_Salient_Object_Detection_CVPR_2021_paper.pdf) | [code](https://github.com/jiwei0921/DCF)<br><br>

[3] Weakly Supervised Video Salient Object Detection(弱监督视频显著性目标检测)<br>
[paper](https://arxiv.org/abs/2104.02391)<br><br>

[2] Group Collaborative Learning for Co-Salient Object Detection(协同显著性目标检测的小组协作学习)<br>
[paper](https://arxiv.org/abs/2104.01108) | [project](https://github.com/fanq15/GCoNet)<br><br>

[1] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)<br>
[paper](https://arxiv.org/abs/2103.11832)<br><br>

<a name="ADI"/> 

### 图像异常检测(Anomally Detection in Image)

[7] Anomaly Detection in Video via Self-Supervised and Multi-Task Learning(通过自我监督和多任务学习进行视频异常检测)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Georgescu_Anomaly_Detection_in_Video_via_Self-Supervised_and_Multi-Task_Learning_CVPR_2021_paper.pdf)<br><br>

[6] MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection(用于视频异常检测的多实例自训练框架)<br>
[paper](https://arxiv.org/abs/2104.01633)<br><br>

[5] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization(CutPaste：用于异常检测和定位的自我监督学习)<br>
[paper](https://arxiv.org/abs/2104.04015)<br><br>

[4] CutPaste: Self-Supervised Learning for Anomaly Detection and Localization(CutPaste：用于异常检测和定位的自监督学习)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_CutPaste_Self-Supervised_Learning_for_Anomaly_Detection_and_Localization_CVPR_2021_paper.pdf)<br><br>

[3] Pixel-wise Anomaly Detection in Complex Driving Scenes(复杂驾驶场景中的逐像素异常检测)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Di_Biase_Pixel-Wise_Anomaly_Detection_in_Complex_Driving_Scenes_CVPR_2021_paper.pdf)<br><br>

[2] PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation(PANDA：调整用于异常检测和分割的预训练特征)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Reiss_PANDA_Adapting_Pretrained_Features_for_Anomaly_Detection_and_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/talreiss/PANDA)<br><br>

[1] Multiresolution Knowledge Distillation for Anomaly Detection(用于异常检测的多分辨率知识蒸馏)<br>
[paper](https://arxiv.org/abs/2011.11108)<br><br>

<a name="KeypointDetection"/> 

### 关键点检测(Keypoint Detection)

[1] Skeleton Merger: an Unsupervised Aligned Keypoint Detector(骨架合并：无监督的对准关键点检测器)<br>
[paper](https://arxiv.org/pdf/2103.10814.pdf) | [code](https://github.com/eliphatfs/SkeletonMerger)<br><br>


<br>
<a name="Segmentation"/> 

## 分割(Segmentation)

<a name="ImageSegmentation"/> 

## 图像分割(Image Segmentation)

[14] ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation(ATSO：半监督图像分割的异步师生优化)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Huo_ATSO_Asynchronous_Teacher-Student_Optimization_for_Semi-Supervised_Image_Segmentation_CVPR_2021_paper.pdf)<br><br>

[13] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation(用于【基于文本的图像分割】的具有协同注意力嵌入的编码器融合网络)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Feng_Encoder_Fusion_Network_With_Co-Attention_Embedding_for_Referring_Image_Segmentation_CVPR_2021_paper.pdf)<br><br>

[12] Bottom-Up Shift and Reasoning for Referring Image Segmentation(【基于文本的图像分割】的自底向上移位和推理)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Bottom-Up_Shift_and_Reasoning_for_Referring_Image_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/incredibleXM/BUSNet)<br><br>

[11] Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation(每种注释都至关重要：【医学图像分割】的多标签深度监管)<br>
[paper](https://arxiv.org/abs/2104.13243)<br><br>

[10] Camouflaged Object Segmentation with Distraction Mining(【伪装目标分割】基于分心挖掘的伪装目标分割)<br>
[paper](https://arxiv.org/abs/2104.10475)<br><br>

[9] Adaptive Prototype Learning and Allocation for Few-Shot Segmentation(【小样本分割】的自适应原型学习和分配)<br>
[paper](https://arxiv.org/abs/2104.01893)<br><br>

[8] DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation(DiNTS：用于3D【医学图像分割】的可区分神经网络拓扑搜索)<br>
[paper](https://arxiv.org/abs/2103.15954)<br><br>

[7] Self-Guided and Cross-Guided Learning for Few-Shot Segmentation(自我指导和交叉指导学习，用于【小样本分割】)<br>
[paper](https://arxiv.org/abs/2103.16129)<br><br>

[6] Locate then Segment: A Strong Pipeline for Referring Image Segmentation(找到然后分割：用于【基于文本的图像分割】的强大管道)<br>
[paper](https://arxiv.org/abs/2103.16284)<br><br>

[5] Boundary IoU: Improving Object-Centric Image Segmentation Evaluation(边界IoU：改进以对象为中心的图像分割评估)<br>
[paper](https://arxiv.org/abs/2103.16562) | [code](https://bowenc0221.github.io/boundary-iou)<br><br>

[4] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行【航空图像分割】)<br>
[paper](https://arxiv.org/pdf/2103.06564.pdf)<br><br>

[3] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space(在连续频率空间中通过情景学习进行【医学图像分割】的联合域泛化)<br>
[paper](https://arxiv.org/abs/2103.06030) | [code](https://github.com/liuquande/FedDG-ELCFS)<br><br>

[2] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?(【小样本分割】没有元学习的小样本分割：你只需要一个好的转换推论？)<br>
[paper](https://arxiv.org/abs/2012.06166) | [code](https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation)<br><br>

[1] Learning Calibrated Medical Image Segmentation via Multi-rater Agreement Modeling(通过多评分者协议建模学习校准的【医学图像分割】)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ji_Learning_Calibrated_Medical_Image_Segmentation_via_Multi-Rater_Agreement_Modeling_CVPR_2021_paper.pdf) ｜ [code](https://github.com/jiwei0921/MRNet/)<br><br>


<a name="PanopticSegmentation"/> 

### 全景分割(Panoptic Segmentation)

[15] Fully Convolutional Networks for Panoptic Segmentation(Oral | 用于全景分割的全卷积网络)<br>
[paper](https://arxiv.org/abs/2012.00720) | [code](https://github.com/Jia-Research-Lab/PanopticFCN)<br><br>

[14] Part-aware Panoptic Segmentation(部分感知全景分割)<br>
[paper](https://arxiv.org/abs/2106.06351) | [code](https://github.com/tue-mps/panoptic_parts)<br><br>

[13] LiDAR-based Panoptic Segmentation via Dynamic Shifting Network(通过动态移动网络基于 LiDAR 的全景分割)<br>
[paper](https://arxiv.org/abs/2011.11964) | [code](https://github.com/hongfz16/DS-Net)<br><br>

[12] Learning to Associate Every Segment for Video Panoptic Segmentation(学习关联视频全景分割的每个片段)<br>
[paper](https://arxiv.org/abs/2106.09453)<br><br>

[11] Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation(用于无提议全景分割的分层 Lova ́sz 嵌入)<br>
[paper](https://arxiv.org/abs/2106.04555)<br><br>

[10] Exemplar-Based Open-Set Panoptic Segmentation Network(基于范例的开放集全景分割网络)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Hwang_Exemplar-Based_Open-Set_Panoptic_Segmentation_Network_CVPR_2021_paper.pdf) | [project](https://cv.snu.ac.kr/research/EOPSN)<br><br>

[9] ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation()<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Qiao_VIP-DeepLab_Learning_Visual_Perception_With_Depth-Aware_Video_Panoptic_Segmentation_CVPR_2021_paper.pdf) | [code]( https://github.com/joe-siyuan-qiao/ViP-DeepLab)<br><br>

[8] LPSNet: A lightweight solution for fast panoptic segmentation(LPSNet：快速全景分割的轻量级解决方案)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Hong_LPSNet_A_Lightweight_Solution_for_Fast_Panoptic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[7] Improving Panoptic Segmentation at All Scales(改进所有尺度的全景分割)<br>
[paper](http://arxiv.org/abs/2012.07717)<br><br>

[6] Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation(面向弱监督全景分割的联合物物挖掘)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Shen_Toward_Joint_Thing-and-Stuff_Mining_for_Weakly_Supervised_Panoptic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[5] MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers(MaX-DeepLab：使用掩模转换器进行端到端全景分割)<br>
[paper](https://arxiv.org/abs/2012.00759) ｜ [code](https://github.com/google-research/deeplab2)<br><br>

[4] Panoptic Segmentation Forecasting(全景分割预测)<br>
[paper](https://arxiv.org/abs/2104.03962)<br><br>

[3] Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation(无提案的LiDAR点云全景分割)<br>
[paper](https://arxiv.org/pdf/2103.14962.pdf)<br><br>

[2] Cross-View Regularization for Domain Adaptive Panoptic Segmentation(用于域自适应全景分割的跨视图正则化)<br>
[paper](https://arxiv.org/abs/2103.02584)<br><br>

[1] 4D Panoptic LiDAR Segmentation（4D全景LiDAR分割）<br>
[paper](https://arxiv.org/abs/2102.12472)<br><br>

<a name="SemanticSegmentation"/> 

### 语义分割(Semantic Segmentation)

[49] DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation(DCNAS：语义图像分割的密集连接神经架构搜索)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_DCNAS_Densely_Connected_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.pdf)<br><br>

[48] Semi-supervised Semantic Segmentation with Directional Context-aware Consistency(具有定向上下文感知一致性的半监督语义分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Lai_Semi-Supervised_Semantic_Segmentation_With_Directional_Context-Aware_Consistency_CVPR_2021_paper.pdf)<br><br>

[47] Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation(用于小样本语义分割的尺度感知图神经网络)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Xie_Scale-Aware_Graph_Neural_Network_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[46] Uncertainty Reduction for Model Adaptation in Semantic Segmentation(语义分割中模型自适应的不确定性降低)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/S_Uncertainty_Reduction_for_Model_Adaptation_in_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[45] Rethinking BiSeNet For Real-time Semantic Segmentation(重新思考用于实时语义分割的 BiSeNet)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Fan_Rethinking_BiSeNet_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/ MichaelFan01/STDC-Seg)<br><br>

[44] HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation(HyperSeg：用于实时语义分割的 Patch-wise Hypernetwork)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf) | [code](https://nirkin.com/hyperseg)<br><br>

[43] Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds(完整和标签：激光雷达点云语义分割的域适应方法)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yi_Complete__Label_A_Domain_Adaptation_Approach_to_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[42] Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation(弱监督语义分割的非显著区域对象挖掘)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yao_Non-Salient_Region_Object_Mining_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.pdf) ｜ [code](https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom)<br><br>

[41] Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation(域自适应语义分割的原型伪标签去噪和目标结构学习)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Prototypical_Pseudo_Label_Denoising_and_Target_Structure_Learning_for_Domain_CVPR_2021_paper.pdf) | [code](https://github.com/microsoft/ProDA)<br><br>

[40] Few-shot 3D Point Cloud Semantic Segmentation(小样本 3D 点云语义分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhao_Few-Shot_3D_Point_Cloud_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[39] Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation(弱监督语义分割的嵌入式判别注意机制)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wu_Embedded_Discriminative_Attention_Mechanism_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[38] Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation(针对小样本语义分割的抗锯齿语义重建)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Anti-Aliasing_Semantic_Reconstruction_for_Few-Shot_Semantic_Segmentation_CVPR_2021_paper.pdf) ｜ [code](https://github.com/Bibkiller/ASR)<br><br>

[37] Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation(使用自监督深度估计改进语义分割的三种方法)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Hoyer_Three_Ways_To_Improve_Semantic_Segmentation_With_Self-Supervised_Depth_Estimation_CVPR_2021_paper.pdf) | [code](https://github.com/lhoyer/improving_segmentation_ with_selfsupervised_depth)<br><br>

[36] (AF) -S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network(用于稀疏语义分割网络的具有自适应特征选择的注意力特征融合)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Cheng_AF2-S3Net_Attentive_Feature_Fusion_With_Adaptive_Feature_Selection_for_Sparse_CVPR_2021_paper.pdf)<br><br>

[35] Exploit Visual Dependency Relations for Semantic Segmentation(利用视觉依赖关系进行语义分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Exploit_Visual_Dependency_Relations_for_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[34] Revisiting Superpixels for Active Learning in Semantic Segmentation with Realistic Annotation Costs(在具有现实注释成本的语义分割中重新审视用于主动学习的超像素)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.pdf)<br><br>

[33] ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation(ABMDRNet：用于 RGB-T 语义分割的自适应加权双向模态差异减少网络)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_ABMDRNet_Adaptive-Weighted_Bi-Directional_Modality_Difference_Reduction_Network_for_RGB-T_Semantic_CVPR_2021_paper.pdf)<br><br>

[32] CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation(CGA-Net：点云语义分割的类别引导聚合)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Lu_CGA-Net_Category_Guided_Aggregation_for_Point_Cloud_Semantic_Segmentation_CVPR_2021_paper.pdf)<br><br>

[31] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers(使用 Transformer 从序列到序列的角度重新思考语义分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zheng_Rethinking_Semantic_Segmentation_From_a_Sequence-to-Sequence_Perspective_With_Transformers_CVPR_2021_paper.pdf) | [project](https://fudan-zvg.github.io/SETR)<br><br>

[30] Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation(弱和半监督语义分割的反对抗操纵属性)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Lee_Anti-Adversarially_Manipulated_Attributions_for_Weakly_and_Semi-Supervised_Semantic_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/jbeomlee93/AdvCAM)<br><br>

[29] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision(具有交叉伪监督的半监督语义分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Semi-Supervised_Semantic_Segmentation_With_Cross_Pseudo_Supervision_CVPR_2021_paper.pdf)<br><br>

[28] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation(铁路不是火车：作为弱监督语义分割的伪像素监督的显著性)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Lee_Railroad_Is_Not_a_Train_Saliency_As_Pseudo-Pixel_Supervision_for_CVPR_2021_paper.pdf) ｜ [code](https://github.com/halbielee/EPS)<br><br>

[27] Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation(集群、拆分、融合和更新：开放复合域自适应语义分割的元学习)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.pdf)<br><br>

[26] Omni-supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning(通过渐进感受野分量推理的全方位监督点云分割)<br>
[paper](https://arxiv.org/abs/2105.10203) | [code](https://github.com/azuki-miho/RFCR)<br><br>

[25] Unsupervised Part Segmentation through Disentangling Appearance and Shape(基于外观和形状分离的无监督零件分割)<br>
[paper](https://arxiv.org/abs/2105.12405)<br><br>

[24] Self-supervised Augmentation Consistency for Adapting Semantic Segmentation(自适应语义分割的自我监督增强一致性)<br>
[paper](https://arxiv.org/abs/2105.00097) | [code](https://github.com/visinf/da-sac)<br><br>

[23] DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation(DANNet：一种用于无监督夜间语义切分的单阶段域自适应网络)<br>
[paper](https://arxiv.org/abs/2104.10834) | [code](https://github. com/W-zx-Y/DANNet)<br><br>

[22] Improving Online Performance Prediction for Semantic Segmentation(改进用于语义分割的在线性能预测)<br>
[paper](https://arxiv.org/abs/2104.05255)<br><br>

[21] Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization(生成模型的语义分割：半监督学习和强大的域外泛化)<br>
[paper](https://arxiv.org/abs/2104.05833) ｜ [code](https://nv-tlabs.github.io/semanticGAN/)<br><br>

[20] Progressive Semantic Segmentation(渐进式语义分割)<br>
[paper](https://arxiv.org/abs/2104.03778) | [code](https://github.com/VinAIResearch/MagNet)<br><br>

[19] InverseForm: A Loss Function for Structured Boundary-Aware Segmentation(结构化边界感知分割的损失函数)<br>
[paper](https://arxiv.org/abs/2104.02745)<br><br>

[18] 3D-to-2D Distillation for Indoor Scene Parsing(用于室内场景解析的3D到2D蒸馏)<br>
[paper](https://arxiv.org/abs/2104.02243)<br><br>

[17] One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation(一键式点击：一种用于弱监督3D语义分割的自训练方法)<br>
[paper](https://arxiv.org/abs/2104.02246)<br><br>

[16] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation(弱监督语义分割的背景感知池和噪声感知损失)<br>
[paper](https://arxiv.org/abs/2104.00905)<br><br>

[15] PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering(PiCIE：在聚类中使用不变性和等方差的无监督语义分割)<br>
[paper](https://arxiv.org/abs/2103.17070) | [code](https://github.com/janghyuncho/PiCIE)<br><br>

[14] Source-Free Domain Adaptation for Semantic Segmentation(用于语义分割的无源域自适应)<br>
[paper](https://arxiv.org/abs/2103.16372)<br><br>

[13] RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening(通过实例选择性增白提高城市场景分割中的域泛化)<br>
[paper](https://arxiv.org/abs/2103.15597) | [code](https://github.com/shachoi/RobustNet)<br><br>

[12] Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization(具有光度对齐和类别中心正则化的粗到细域自适应语义分割)<br>
[paper](https://arxiv.org/abs/2103.13041)<br><br>

[11] Cross-Dataset Collaborative Learning for Semantic Segmentation(跨数据集协同学习的语义分割)<br>
[paper](https://arxiv.org/abs/2103.11351)<br><br>

[10] BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation(用于弱监督语义和实例细分的边界框归因图)<br>
[paper](https://arxiv.org/abs/2103.08907)<br><br>

[9] Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations(通过稀疏和纠缠的潜在表示的排斥力进行连续语义分割)<br>
[paper](https://arxiv.org/abs/2103.06342)<br><br>

[8] Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion(通过双边扩充和自适应融合对实点云场景进行语义分割)<br>
[paper](https://arxiv.org/abs/2103.07074)<br><br>

[7] Capturing Omni-Range Context for Omnidirectional Segmentation(捕获全方位上下文进行全方位分割)<br>
[paper](https://arxiv.org/abs/2103.05687)<br><br>

[6] MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation(MetaCorrection：语义分割中无监督域自适应的域感知元丢失校正)<br>
[paper](https://arxiv.org/abs/2103.05254)<br><br>

[5] Learning Statistical Texture for Semantic Segmentation(学习用于语义分割的统计纹理)<br>
[paper](https://arxiv.org/abs/2103.04133)<br><br>

[4] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation(基于双层域混合的半监督域自适应语义分割)<br>
[paper](https://arxiv.org/pdf/2103.04705.pdf)<br><br>

[3] Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation(多源领域自适应与协作学习的语义分割)<br>
[paper](https://arxiv.org/abs/2103.04717)<br><br>

[2] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)<br>
[paper](https://arxiv.org/abs/2009.03137) | [code](https://github.com/QingyongHu/SensatUrban)<br><br>

[1] PLOP: Learning without Forgetting for Continual Semantic Segmentation（PLOP：学习而不会忘记连续的语义分割）<br>
[paper](https://arxiv.org/abs/2011.11390) ｜ [code](https://github.com/arthurdouillard/CVPR2021_PLOP)<br><br>

<a name="InstanceSegmentation"/> 

### 实例分割(Instance Segmentation)

[24] BoxInst: High-Performance Instance Segmentation with Box Annotations(BoxInst：带框注释的高性能实例分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Tian_BoxInst_High-Performance_Instance_Segmentation_With_Box_Annotations_CVPR_2021_paper.pdf) | [code](https://git.io/AdelaiDet)<br><br>

[23] Unsupervised Discovery of the Long-Tail in Instance Segmentation Using Hierarchical Self-Supervision(使用分层自监督在实例分割中无监督地发现长尾)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Weng_Unsupervised_Discovery_of_the_Long-Tail_in_Instance_Segmentation_Using_Hierarchical_CVPR_2021_paper.pdf)<br><br>

[22] Seesaw Loss for Long-Tailed Instance Segmentation(长尾实例分割的跷跷板损失)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_Seesaw_Loss_for_Long-Tailed_Instance_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/open-mmlab/mmdetection)<br><br>

[21] ColorRL: Reinforced Coloring for End-to-End Instance Segmentation(ColorRL：端到端实例分割的强化着色)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Tuan_ColorRL_Reinforced_Coloring_for_End-to-End_Instance_Segmentation_CVPR_2021_paper.pdf)<br><br>

[20] SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation(SG-Net：用于单阶段视频实例分割的空间粒度网络)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_SG-Net_Spatial_Granularity_Network_for_One-Stage_Video_Instance_Segmentation_CVPR_2021_paper.pdf)<br><br>

[19] Deeply Shape-guided Cascade for Instance Segmentation(用于实例分割的深度形状引导级联)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ding_Deeply_Shape-Guided_Cascade_for_Instance_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/hding2455/DSC)<br><br>

[18] DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation(DCT-Mask：用于实例分割的离散余弦变换掩码表示)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Shen_DCT-Mask_Discrete_Cosine_Transform_Mask_Representation_for_Instance_Segmentation_CVPR_2021_paper.pdf)<br><br>

[17] Point Cloud Instance Segmentation using Probabilistic Embeddings(使用概率嵌入的点云实例分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Point_Cloud_Instance_Segmentation_Using_Probabilistic_Embeddings_CVPR_2021_paper.pdf)<br><br>

[16] Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation(简单的复制粘贴是一种用于实例分割的强数据增强方法)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ghiasi_Simple_Copy-Paste_Is_a_Strong_Data_Augmentation_Method_for_Instance_CVPR_2021_paper.pdf)  | [code](https://github.com/tensorflow/tpu/tree/master/models/ official/detection/projects/copy_paste)<br><br>

[15] Zero-Shot Instance Segmentation(零样本实例分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Zheng_Zero-Shot_Instance_Segmentation_CVPR_2021_paper.pdf)<br><br>

[14] DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution(DyCo3D：通过动态卷积对 3D 点云进行稳健的实例分割)<br>
[paper](https://arxiv.org/abs/2011.13328) | [code](https://git.io/DyCo3D)<br><br>

[13] Incremental Few-Shot Instance Segmentation(增量小样本实例分割)<br>
[paper](https://arxiv.org/abs/2105.05312) ｜ [code](https://github.com/danganea/iMTFA)<br><br>

[12] Robust Instance Segmentation through Reasoning about Multi-Object Occlusion(通过推理多对象遮挡进行鲁棒的实例分割)<br>
[paper](https://arxiv.org/abs/2012.02107) | [code](https://github.com/XD7479/Multi-Object-Occlusion)<br><br>

[11] A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation(A ^ 2-FPN：基于注意力聚合的特征金字塔网络，用于实例分割)<br>
[paper](https://arxiv.org/abs/2105.03186)<br><br>

[10] RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features(RefineMask：通过细粒度功能实现高质量实例分割)<br>
[paper](https://arxiv.org/abs/2104.08569) | [code](https://github.com/zhanggang001/RefineMask)<br><br>

[9] Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation(看起来更接近以更好地分割：用于实例分割的边界补丁优化)<br>
[paper](https://arxiv.org/abs/2104.05239) | [code](https://github.com/tinyalpha/BPR)<br><br>

[8] Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation(空间特征校准和时间融合，以实现有效的一级视频实例分割)<br>
[paper](https://arxiv.org/abs/2104.05606) | [code](https://github.com/MinghanLi/STMask)<br><br>

[7] DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images(DARCNN：用于生物医学图像中无监督实例分割的基于域自适应区域的卷积神经网络)<br>
[paper](https://arxiv.org/abs/2104.01325)<br><br>

[6] Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images(通过带有显着图像的类不可知学习进行弱监督实例分割)<br>
[paper](https://arxiv.org/abs/2104.01526) | [code](https://github.com/hustvl/BoxCaseg)<br><br>

[5] FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter(FAPIS：少量基于无锚的基于实例分割器)<br>
[paper](https://arxiv.org/abs/2104.00073)<br><br>

[4] Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency(具有时间掩码一致性的视频的弱监督实例分割)<br>
[paper](https://arxiv.org/abs/2103.12886)<br><br>

[3] Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers(具有重叠BiLayer的深度遮挡感知实例分割)<br>
[paper](https://arxiv.org/abs/2103.12340) | [code](https://github.com/lkeab/BCNet)<br><br>

[2] BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation(用于弱监督语义和实例细分的边界框归因图)<br>
[paper](https://arxiv.org/abs/2103.08907) | [code](https://github.com/jbeomlee93/BBAM)<br><br>

[1] End-to-End Video Instance Segmentation with Transformers(使用Transformer的端到端视频实例分割) <br>
[paper](https://arxiv.org/abs/2011.14503) | [code](https://github.com/Epiphqny/VisTR)
<br><br>

<a name="Superpixel"/> 

## 超像素(Superpixel)

[2] Revisiting Superpixels for Active Learning in Semantic Segmentation with Realistic Annotation Costs(在具有现实注释成本的语义分割中重新审视用于主动学习的超像素)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Cai_Revisiting_Superpixels_for_Active_Learning_in_Semantic_Segmentation_With_Realistic_CVPR_2021_paper.pdf)<br><br>

[1] Learning the Superpixel in a Non-iterative and Lifelong Manner(以非迭代和终身的方式学习超像素)<br>
[paper](https://arxiv.org/pdf/2103.10681.pdf)<br><br>

<a name="VOS"/> 

### 视频目标分割(Video Object Segmentation)

[12] Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation(在半监督视频对象分割中使用重用门函数学习动态网络)<br>
[paper](https://arxiv.org/abs/2012.11655) ｜ [code](https://github.com/HYOJINPARK/Reuse_VOS)<br><br>

[11] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation(Oral|SSTVOS：用于视频对象分割的稀疏时空变换器)<br>
[paper](https://arxiv.org/abs/2101.08833) ｜ [code](https://github.com/dukebw/SSTVOS)<br><br>

[10] SwiftNet: Real-time Video Object Segmentation(SwiftNet：实时视频对象分割)<br>
[paper](https://arxiv.org/abs/2102.04604) | [code](https://github.com/haochenheheda/SwiftNet)<br><br>

[9] Video Object Segmentation Using Global and Instance Embedding Learning(使用全局和实例嵌入学习的视频对象分割)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ge_Video_Object_Segmentation_Using_Global_and_Instance_Embedding_Learning_CVPR_2021_paper.pdf)<br><br>

[8] Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation(深入研究小样本视频对象分割的多对多注意力)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Delving_Deep_Into_Many-to-Many_Attention_for_Few-Shot_Video_Object_Segmentation_CVPR_2021_paper.pdf) ｜ [code](https://github.com/scutpaul/DANet)<br><br>

[7] Reciprocal Transformations for Unsupervised Video Object Segmentation(无监督视频对象分割的互易变换)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ren_Reciprocal_Transformations_for_Unsupervised_Video_Object_Segmentation_CVPR_2021_paper.pdf) | [code](https://github.com/OliverRensu/RTNet)<br><br>

[6] Learning Position and Target Consistency for Memory-based Video Object Segmentation(基于内存的视频对象分割的学习位置和目标一致性)<br>
[paper](https://arxiv.org/abs/2104.04329)<br><br>

[5] Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps(基于可靠性的注意映射引导交互式视频对象分割)<br>
[paper](https://arxiv.org/abs/2104.10386) | [code](https://github.com/yuk6heo/GIS-RAmap)<br><br>

[4] Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation(无监督视频多对象分割的目标感知对象发现和关联)<br>
[paper](https://arxiv.org/abs/2104.04782)<br><br>

[3] Efficient Regional Memory Network for Video Object Segmentation(用于视频对象分割的高效区域存储网络)<br>
[paper](https://arxiv.org/abs/2103.12934) | [code](https://haozhexie.com/project/rmnet)<br><br>

[2] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild(学习推荐帧用于交互式野外视频对象分割)<br>
[paper](https://arxiv.org/pdf/2103.10391.pdf) | [code](https://github.com/svip-lab/IVOS-W)<br><br>

[1] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion(模块化交互式视频对象分割：面具交互，传播和差异感知融合)<br>
[paper](https://arxiv.org/pdf/2103.07941.pdf) | [project](https://hkchengrex.github.io/MiVOS/)<br><br>

<a name="Matting"/> 

### 抠图(Matting)

[7] Deep Video Matting via Spatio-Temporal Alignment and Aggregation(通过时空对齐和聚合的深度视频抠图)<br>
[paper](https://arxiv.org/abs/2104.11208)<br><br>

[6] Learning Affinity-Aware Upsampling for Deep Image Matting(学习用于深度图像抠图的亲和感知上采样)<br>
[paper](https://arxiv.org/abs/2011.14288)<br><br>

[5] Semantic Image Matting(语义图像抠图)<br>
[paper](https://arxiv.org/abs/2104.08201) ｜ [code](https://github.com/nowsyn/SIM)<br><br>

[4] Mask Guided Matting via Progressive Refinement Network(通过渐进式细化网络的掩码引导抠图)<br>
[paper](https://arxiv.org/abs/2012.06722）| [code](https://github.com/yucornetto/MGMatting)<br><br>

[3] Omnimatte: Associating Objects and Their Effects in Video(Omnimatte：在视频中关联对象及其效果)<br>
[paper](https://arxiv.org/abs/2105.06993) | [project](https://omnimatte.github.io/)<br><br>

[2] Improved Image Matting via Real-time User Clicks and Uncertainty Estimation(通过实时用户点击和不确定性估计改善图像抠图)<br>
[paper](https://arxiv.org/abs/2012.08323)<br><br>

[1] Real-Time High Resolution Background Matting<br>
[paper](https://arxiv.org/abs/2012.07810) | [code](https://github.com/PeterL1n/BackgroundMattingV2) | [project](https://grail.cs.washington.edu/projects/background-matting-v2/) | [video](https://youtu.be/oMfPTeYDF9g)<br><br>

<a name="DensePrediction"/> 

### 密集预测(Dense Prediction)

[3] Generic Perceptual Loss for Modeling Structured Output Dependencies(用于建模结构化输出依存关系的一般感知损失)<br>
[paper](https://arxiv.org/pdf/2103.10571.pdf)<br><br>

[2]Densely connected multidilated convolutional networks for dense prediction tasks（用于密集预测任务的多重卷积连接网络）<br>
[paper](https://arxiv.org/abs/2011.11844)<br><br>

[1] Dense Contrastive Learning for Self-Supervised Visual Pre-Training(自监督视觉预训练的密集对比学习)<br>
[paper](https://arxiv.org/abs/2011.09157) | [code](https://github.com/WXinlong/DenseCL)<br><br>

<a name="Estimation"/> 

## 估计(Estimation)

<a name="HumanPoseEstimation"/> 

### 姿态估计(Human Pose Estimation)

[23] Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video(超越视频中时间一致的 3D 人体姿势和形状的静态特征)<br>
[paper](https://arxiv.org/abs/2011.08627) | [code](https://github.com/hongsukchoi/TCMR_RELEASE) | [video](https://youtu.be/WB3nTnSQDII)<br><br>

[22] ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search(ViPNAS：通过神经架构搜索进行高效的视频姿态估计)<br>
[paper](https://arxiv.org/abs/2105.10154)<br><br>

[21] When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks(当人体姿势估计达到稳健性时：对抗算法和基准)<br>
[paper](https://arxiv.org/abs/2105.06152)<br><br>

[20] Monocular Real-time Full Body Capture with Inter-part Correlations(具有部分间相关性的单眼实时全身捕获)<br>
[paper](https://arxiv.org/abs/2012.06087)<br><br>

[19] Unsupervised Human Pose Estimation through Transforming Shape Templates(通过变换形状模板的无监督人体姿势估计)<br>
[paper](https://arxiv.org/abs/2105.04154) | [project](https://infantmotion.github.io/)<br><br>

[18] Body Meshes as Points(身体网格物体为点)<br>
[paper](https://arxiv.org/abs/2105.02467) | [code](https://github.com/jfzhang95/BMP)<br><br>

[17] PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation(PoseAug：用于3D人类姿势估计的可微分姿势增强框架)<br>
[paper](https://arxiv.org/abs/2105.02465) | [code](https://github.com/jfzhang95/PoseAug)<br><br>

[16] AGORA: Avatars in Geography Optimized for Regression Analysis(AGORA：针对回归分析进行了优化的地理头像)<br>
[paper](https://arxiv.org/abs/2104.14643) | [project](https://agora.is.tue.mpg.de)<br><br>

[15] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration(用于3D人体网格配准的局部感知分段变换字段)<br>
[paper](https://arxiv.org/abs/2104.08160) | [code](https://taconite.github.io/PTF/website/PTF.html)<br><br>

[14] Pose Recognition with Cascade Transformers(级联Transformer的姿势识别)<br>
[paper](https://arxiv.org/abs/2104.06976) | [code](https://github.com/mlpc-ucsd/PRTR)<br><br>

[13] Lite-HRNet: A Lightweight High-Resolution Network(Lite-HRNet：轻巧的高分辨率网络)<br>
[paper](https://arxiv.org/abs/2104.06403) | [code](https://github.com/HRNet/Lite-HRNet)<br><br>

[12] Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo(具有平面扫描立体声的多视图多人3D姿势估计)<br>
[paper](https://arxiv.org/abs/2104.02273) | [code](https://github.com/jiahaoLjh/PlaneSweepPose)<br><br>

[11] Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression(通过解聚的关键点自下而上的人体姿势估计)<br>
[paper](https://arxiv.org/abs/2104.02300) | [code](https://github.com/HRNet/DEKR)<br><br>

[10] Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks(通过集成自上而下和自下而上的网络进行单眼3D多人姿势估计)<br>
[paper](https://arxiv.org/abs/2104.01797) | [code](https://github.com/3dpose/3D-Multi-Person-Pose)<br><br>

[9] Reconstructing 3D Human Pose by Watching Humans in the Mirror(通过照镜子中的人来重建3D人的姿势)<br>
[paper](https://arxiv.org/abs/2104.00340) | [project](https://zju3dv.github.io/Mirrored-Human/)<br><br>

[8] SimPoE: Simulated Character Control for 3D Human Pose Estimation(用于3D人体姿势估计的模拟角色控制)<br>
[paper](https://arxiv.org/abs/2104.00683) | [project](https://www.ye-yuan.com/simpoe/)<br><br>

[7] Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors(人体姿势系统（HPS）：人体安装传感器在大场景中的3D人体姿势估计和自定位)<br>
[paper](https://arxiv.org/abs/2103.17265) | [project](http://virtualhumans.mpi-inf.mpg.de/hps/)<br><br>

[6] Graph Stacked Hourglass Networks for 3D Human Pose Estimation(用于3D人体姿势估计的图形堆叠沙漏网络)<br>
[paper](https://arxiv.org/abs/2103.16385)<br><br>

[5] From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation(【动物姿态估计】从合成到真实：用于动物姿势估计的无监督域自适应)<br>
[paper](https://arxiv.org/abs/2103.14843) | [code](https://github.com/chaneyddtt/UDA-Animal-Pose)<br><br>

[4] DCPose: Deep Dual Consecutive Network for Human Pose Estimation(用于人体姿态估计的深度双重连续网络)<br>
[paper](https://arxiv.org/abs/2103.07254) | [code](https://github.com/Pose-Group/DCPose)<br><br>

[3] Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing(用于实例感知人类语义解析的可微分多粒度人类表示学习)<br>
[paper](https://arxiv.org/pdf/2103.04570.pdf) | [code](https://github.com/tfzhou/MG-HumanParsing)<br><br>

[2] CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild（野外自监督的单眼3D人类姿态估计）<br><br>

[1] PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers（具有透视作物层的3D姿势的几何感知神经重建）<br>
[paper](https://arxiv.org/abs/2011.13607)<br><br>

<a name="GestureEstimation"/> 


### 手势估计(Gesture Estimation)

[6] Improving Sign Language Translation with Monolingual Data by Sign Back-Translation(【手势翻译】基于单语数据的手语翻译研究)<br>
[paper](https://arxiv.org/abs/2105.12397)<br><br>

[5] ContactOpt: Optimizing Contact to Improve Grasps(ContactOpt：优化联系人以提高抓地力)<br>
[paper](https://arxiv.org/abs/2104.07267)<br><br>

[4] Fingerspelling Detection in American Sign Language(美国手语中的手指拼写检测)<br>
[paper](https://arxiv.org/abs/2104.01291)<br><br>

[3] Read and Attend: Temporal Localisation in Sign Language Videos(阅读和参加：手语视频中的时间本地化)<br>
[paper](https://arxiv.org/abs/2103.16481) | [project](https://www.robots.ox.ac.uk/ ̃vgg/research/bslattend/)<br><br>

[2] Skeleton Based Sign Language Recognition Using Whole-body Keypoints(基于全身关键点的基于骨架的手语识别)<br>
[paper](https://arxiv.org/abs/2103.08833) | [code](https://github.com/jackyjsy/CVPR21Chal-SLR)<br><br>

[1] Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive  2D-1D Registration(基于语义聚合和自适应2D-1D配准的相机空间手部网格恢复)<br>
[paper](https://arxiv.org/pdf/2103.02845.pdf) | [code](https://github.com/SeanChenxy/HandMesh)<br><br>

<a name="Flow/Pose/MotionEstimation"/> 

### 光流/位姿/运动估计(Optical Flow/Pose/Motion Estimation)

[17] Exploring intermediate representation for monocular vehicle pose estimation(探索单目车辆姿态估计的中间表示)<br>
[paper](https://arxiv.org/abs/2011.08464) ｜ [code](https://github.com/Nicholasli1995/EgoNet)<br><br>

[16] Extreme Rotation Estimation using Dense Correlation Volumes(使用密集相关体积的极端旋转估计)<br>
[paper](https://arxiv.org/abs/2104.13530) | [project](https://ruojincai.github.io/ExtremeRotation/)<br><br>

[15] Motion Representations for Articulated Animation(【运动估计&表示】关节动画的运动表示)<br>
[paper](https://arxiv.org/abs/2104.11280) | [code](https://github.com/snap-research/articulated-animation)<br><br>

[14] Self-Supervised Pillar Motion Learning for Autonomous Driving(【运动估计】用于自动驾驶的自我监督支柱运动学习)<br>
[paper](https://arxiv.org/abs/2104.08683)<br><br>

[13] Single-view robot pose and joint angle estimation via render & compare(通过渲染和比较进行单视图机器人姿态和关节角度估计)<br>
[paper](https://arxiv.org/abs/2104.09359) | [code](https://www.di.ens.fr/willow/research/robopose/)<br><br>

[12] Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty(新旧融合：通过几何引导的不确定性学习相对相机姿势)<br>
[paper](https://arxiv.org/abs/2104.08278)<br><br>

[11] VOLDOR: Visual Odometry from Log-logistic Dense Optical flow Residuals(【视觉测距】VOLDOR：来自对数逻辑密集光流残差的视觉里程表)<br>
[paper](https://arxiv.org/abs/2104.06789)<br><br>

[10] DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency(【6D位姿估计】通过双尺度一致性学习6DoF对象姿势估计)<br>
[paper](https://arxiv.org/abs/2104.03658)<br><br>

[9] Learning optical flow from still images(【光流估计】从静止图像中学习光流)<br>
[paper](https://arxiv.org/abs/2104.03965) | [project](https://mattpoggi.github.io/projects/cvpr2021aleotti/)<br><br>

[8] Learning Optical Flow from a Few Matches(【光流估计】通过少量匹配学习光流)<br>
[paper](https://arxiv.org/abs/2104.02166) | [code](https://github.com/zacjiang/scv)<br><br>

[7] FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds(【光流估计】FESTA：场景点云通过时空注意进行光流估计)<br>
[paper](https://arxiv.org/abs/2104.00798)<br><br>

[6] Wide-Depth-Range 6D Object Pose Estimation in Space(【6D位姿估计】空间中的深度范围6D对象姿态估计)<br>
[paper](https://arxiv.org/abs/2104.00337)<br><br>

[5] Deep Two-View Structure-from-Motion Revisited(重新审视运动的深层两视图结构)<br>
[paper](https://arxiv.org/abs/2104.00556)<br><br>

[4] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism(【6D位姿估计】具有分离旋转机制的类别级6D对象姿势估计的快速基于形状的网络)<br>
[paper](https://arxiv.org/abs/2103.07054) | [code](https://github.com/DC1991/FS_Net)<br><br>

[3] GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation(【6D位姿估计】用于单眼6D对象姿态估计的几何引导直接回归网络)<br>
[paper](https://arxiv.org/abs/2102.12145) | [code](https://github.com/THU-DA-6D-Pose-Group/GDR-Net)<br><br>

[2] Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments(在动态室内环境中，通过空间划分的鲁棒神经路由可实现摄像机的重新定位)<br>
[paper](https://arxiv.org/abs/2012.04746) | [project](https://ai.stanford.edu/~hewang/)<br><br>

[1] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)<br>
[paper](https://arxiv.org/pdf/2101.06605.pdf) | [code](https://github.com/huangjh-pub/multibody-sync)<br><br>

<a name="DepthEstimation"/> 

### 深度估计(Depth Estimation)

[18] MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera(MonoRec：在动态环境中从单个移动相机进行半监督密集重建)<br>
[paper](https://arxiv.org/abs/2011.11814) | [project](https://vision.in.tum.de/research/monorec)<br><br>

[17] Single Image Depth Estimation using Wavelet Decomposition(使用小波分解的单幅图像深度估计)<br>
[paper](https://arxiv.org/abs/2106.02022) | [code](https://github.com/nianticlabs/wavelet-monodepth)<br><br>

[16] Self-Supervised Multi-Frame Monocular Scene Flow(自监督多帧单眼场景流)<br>
[paper](https://arxiv.org/abs/2105.02216) | [code](https://github.com/visinf/multi-mono-sf)<br><br>

[15] Binary TTC: A Temporal Geofence for Autonomous Navigation(【接触时间估计】二进制TTC：自主导航的时空地理围栏)<br>
[paper](https://arxiv.org/abs/2101.04777)<br><br>

[14] The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth(时间机会主义者：自我监督的多帧单眼深度)<br>
[paper](https://arxiv.org/abs/2104.14540)<br><br>

[13] Lighting, Reflectance and Geometry Estimation from 360∘ Panoramic Stereo(360∘全景立体的光照、反射率和几何估计)<br>
[paper](https://arxiv.org/abs/2104.09886) | [code](https://github.com/junxuan-li/LRG_360Panoramic)<br><br>

[12] Depth Completion using Plane-Residual Representation(使用平面残差表示法的深度补全)<br>
[paper](https://arxiv.org/abs/2104.07350)<br><br>

[11] StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision(通过立体视觉进行深度感知的布景人类数字化)<br>
[paper](https://arxiv.org/abs/2104.05289) | [projec](http://crishy1995.github.io/StereoPIFuProject)<br><br>

[10] Self-supervised Learning of Depth Inference for Multi-view Stereo(多视图立体声深度推理的自我监督学习)<br>
[paper](https://arxiv.org/abs/2104.02972) | [code](https://github.com/JiayuYANG/Self-supervised-CVP-MVSNet)<br><br>

[9] Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries(遮挡边界处的深度补全和双曲面外推)<br>
[paper](https://arxiv.org/abs/2104.02253)<br><br>

[8] S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation(学习通用的深度特定的结构表示)<br>
[paper](https://arxiv.org/abs/2104.00877) | [code](https://github.com/microsoft/S2R-DepthNet)<br><br>

[7] RGB-D Local Implicit Function for Depth Completion of Transparent Objects(RGB-D局部隐式函数用于透明对象的深度补全)<br>
[paper](https://arxiv.org/abs/2104.00622) | [code](https://research.nvidia.com/publication/2021-03_RGB-D-Local-Implicit)<br><br>

[6] LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering(通过可分辨深度渲染进行单眼360布局估算)<br>
[paper](https://arxiv.org/abs/2104.00568) | [project](https://fuenwang.ml/project/led2net)<br><br>

[5] Deep Two-View Structure-from-Motion Revisited(重新审视运动的深层两视图结构)<br>
[paper](https://arxiv.org/abs/2104.00556)<br><br>

[4] Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging(学习微透镜掩模以在飞行时间成像中进行飞行像素校正)<br>
[paper](https://arxiv.org/abs/2103.16693) | [project](https://light.princeton.edu/publication/mask-tof)<br><br>

[3] Generalizing to the Open World: Deep Visual Odometry with Online Adaptation(推广到开放世界：具有在线适应功能的深度视觉里程表)<br>
[paper](https://arxiv.org/abs/2103.15279)<br><br>

[2] Beyond Image to Depth: Improving Depth Prediction using Echoes(超越图像深度：使用回声改善深度预测)<br>
[paper](https://arxiv.org/pdf/2103.08468.pdf) | [code](https://krantiparida.github.io/projects/bimgdepth.html)<br><br>


[1] PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss(具有神经位置编码和蒸馏消光损耗的自我监督单视图深度估计的像素级精度)<br>
[paper](https://arxiv.org/abs/2103.07362)<br><br>

<br>

<a name="ImageProcessing"/> 


## 图像处理(Image Processing)

[1] Invertible Image Signal Processing(可逆图像信号处理)<br>
[paper](https://arxiv.org/abs/2103.15061) | [code](https://github.com/yzxing87/Invertible-ISP)<br><br>

<a name="SuperResolution"/> 

### 超分辨率(Super Resolution)

[10] KOALAnet: Blind Super-Resolution using Kernel-Oriented Adaptive Local Adjustment(KOALAnet：使用面向内核的自适应局部调整的盲超分辨率)<br>
[paper](https://arxiv.org/abs/2012.08103)<br><br>

[9] BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond()<br>
[paper](https://arxiv.org/abs/2012.02181)<br>

[8] Temporal Modulation Network for Controllable Space-Time Video Super-Resolution(可控时空视频超分辨率的时间调制网络)<br>
[paper](https://arxiv.org/abs/2104.10642) | [code](https://github.com/CS-GangXu/TMNet)<br><br>

[7] SRWarp: Generalized Image Super-Resolution under Arbitrary Transformation(SRWarp：任意变换下的广义图像超分辨率)<br>
[paper](https://arxiv.org/abs/2104.10325)<br><br>

[6] Unsupervised Degradation Representation Learning for Blind Super-Resolution(盲超分辨率的无监督退化表示学习)<br>
[paper](https://arxiv.org/abs/2104.00416) | [code](https://github.com/LongguangWang/DASR)<br><br>

[5] Flow-based Kernel Prior with Application to Blind Super-Resolution(基于流的内核先于盲超分辨率的应用)<br>
[paper](https://arxiv.org/abs/2103.15977) | [code](https://github.com/JingyunLiang/FKP)<br><br>

[4] ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic(通过数据特征加速超分辨率网络的通用框架)<br>
[paper](https://arxiv.org/abs/2103.04039) | [解读-超分性能不降低，计算量降低50%：加速图像超分的ClassSR](https://zhuanlan.zhihu.com/p/355873199)<br><br>

[3] Learning Continuous Image Representation with Local Implicit Image Function(通过局部隐含图像功能学习连续图像表示)<br>
[paepr](https://arxiv.org/abs/2012.09161) | [code](https://github.com/yinboc/liif) | [video](https://youtu.be/6f2roieSY_8) | [project](https://yinboc.github.io/liif/)<br><br>

[2] Data-Free Knowledge Distillation For Image Super-Resolution(DAFL算法的SR版本)<br><br>

[1] AdderSR: Towards Energy Efficient Image Super-Resolution(将加法网路应用到图像超分辨率中)<br>
[paper](https://arxiv.org/pdf/2009.08891.pdf) | [code](https://github.com/huawei-noah/AdderNet)<br>
解读：[华为开源加法神经网络](https://zhuanlan.zhihu.com/p/113536045)<br><br>

<a name="ImageRestoration"/> 

###  图像复原/图像增强(Image Restoration)

[3] Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network(利用动态跳跃连接网络消除显示下摄像机衍射图像伪影)<br>
[paper](https://arxiv.org/abs/2104.09556)<br><br>

[2] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)<br>
[paper](https://arxiv.org/abs/2103.05606) | [code](https://nex-mpi.github.io/)<br><br>

[1] Multi-Stage Progressive Image Restoration(多阶段渐进式图像复原)<br>
[paper](https://arxiv.org/abs/2102.02808) | [code](https://github.com/swz30/MPRNet)<br><br>


<a name="ISR"/> 

### 图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[3] From Shadow Generation to Shadow Removal(从阴影生成到阴影去除)<br>
[paper](https://arxiv.org/abs/2103.12997)<br><br>

[2] Robust Reflection Removal with Reflection-free Flash-only Cues(通过无反射的仅含Flash线索进行鲁棒的反射去除)<br>
[paper](https://arxiv.org/pdf/2103.04273.pdf) | [code](https://github.com/ChenyangLEI/flash-reflection-removal)<br><br>

[1] Auto-Exposure Fusion for Single-Image Shadow Removal(用于单幅图像阴影去除的自动曝光融合)<br>
[paper](https://arxiv.org/abs/2103.01255) | [code](https://github.com/tsingqguo/exposure-fusion-shadow-removal)<br><br>



<a name="ImageDenoising"/> 

### 图像去噪/去模糊/去雨去雾(Image Denoising)

[8] FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise(FBI-Denoiser：泊松-高斯噪声的快速盲图像降噪器)<br>
[paper](https://arxiv.org/abs/2105.10967) | [code](https://github.com/csm9493/FBI-Denoiser)<br><br>

[7] Deep Denoising of Flash and No-Flash Pairs for Photography in Low-Light Environments(弱光环境下用于摄影的闪光灯和非闪光灯对的深度降噪)<br>
[paper](https://arxiv.org/abs/2012.05116) | [project](https://www.cse.wustl.edu/~zhihao.xia/deepfnf/)<br><br>

[6] Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times(数码云台：具有可学习的曝光时间的端到端深度图像稳定)<br>
[paper](https://arxiv.org/abs/2012.04515)<br><br>

[5] Contrastive Learning for Compact Single Image Dehazing(紧凑型单图像去雾的对比学习)<br>
[paper](https://arxiv.org/abs/2104.09367) | [code](https://github.com/GlassyWu/AECR-Net)<br><br>

[4] Explore Image Deblurring via Blur Kernel Space(通过模糊内核空间探索图像去模糊)<br>
[paper](https://arxiv.org/abs/2104.00317)<br><br>

[3] Semi-Supervised Video Deraining with Dynamic Rain Generator(带动态雨水产生器的半监督视频去雨)<br>
[paper](https://arxiv.org/abs/2103.07939)<br><br>

[2] ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring(学习用于视频去模糊的全范围体积对应)<br>
[paper](https://arxiv.org/pdf/2103.04260.pdf)<br><br>

[1] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects(快速移动物体的去模糊和形状恢复)<br>
[paper](https://arxiv.org/abs/2012.00595) | [code](https://github.com/rozumden/DeFMO) | [video](https://www.youtube.com/watch?v=pmAynZvaaQ4)<br><br>

<a name="ImageEdit"/> 

### 图像编辑/图像修复(Image Edit/Inpainting)

[11] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(PD-GAN：用于图像修复的概率多样GAN)<br>
[paper](https://arxiv.org/abs/2105.02201)<br><br>

[10] StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing(StyleMapGAN：利用GAN中潜在的空间维度进行实时图像编辑)<br>
[paper](https://arxiv.org/abs/2104.14754) | [code](https://github.com/naver-ai/StyleMapGAN)<br><br>

[9] Image Inpainting with External-internal Learning and Monochromic Bottleneck(具有内在内在学习和单色瓶颈的图像修复)<br>
[paper](https://arxiv.org/abs/2104.09068)<br><br>

[8] TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations(通过合并多个颜色和空间变换进行参考引导的图像修复)
[paper](https://arxiv.org/abs/2103.15982)<br><br>

[7] DeFLOCNet: Deep Image Editing via Flexible Low-level Controls(通过灵活的低级控件进行深度图像编辑)<br>
[paper](https://arxiv.org/abs/2103.12723)<br><br>

[6] Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE(使用分层VQ-VAE生成图像修复的多样结构)<br>
[paper](https://arxiv.org/pdf/2103.10022) | [code](https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting)<br><br>

[5] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)<br>
[paper](https://arxiv.org/abs/2103.04023) | [code](https://github.com/Zhangjinso/PISE)<br><br>

[4] DeFLOCNet: Deep Image Editing via Flexible Low level Controls(通过灵活的低级控件进行深度图像编辑)<br><br>

[3] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(用于图像修复的概率多样GAN)<br><br>

[2] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)<br>
[paper](https://arxiv.org/abs/2103.03243) | [code](https://github.com/mit-han-lab/anycost-gan)<br><br>

[1] Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing（利用GAN中潜在的空间维度进行实时图像编辑）<br><br>

<a name="ImageTranslation"/> 

### 图像翻译(Image Translation)

[9] CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation(CoCosNet v2：用于图像翻译的全分辨率函授学习)<br>
[paper](https://arxiv.org/abs/2012.02047)<br><br>

[8] Visualizing Adapted Knowledge in Domain Transfer(领域转移中适应性知识的可视化)<br>
[paper](https://arxiv.org/abs/2104.10602) | [code](https://github.com/hou-yz/DA_visualization)<br><br>

[7] Memory-guided Unsupervised Image-to-image Translation(内存引导的无监督图像到图像翻译)<br>
[paper](https://arxiv.org/abs/2104.05170)<br><br>

[6] ReMix: Towards Image-to-Image Translation with Limited Data(使用有限的数据实现图像到图像的翻译)<br>
[paper](https://arxiv.org/abs/2103.16835)<br><br>

[5] Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation(闭环：通过解图像翻译联合产生和去除雨水)<br>
[paper](https://arxiv.org/abs/2103.13660)<br><br>
 
[4] CoMoGAN: continuous model-guided image-to-image translation(连续的模型指导的图像到图像翻译)<br>
[paper](https://arxiv.org/abs/2103.06879) | [code](http://github.com/cv-rits/CoMoGAN)<br><br>

[3] Spatially-Adaptive Pixelwise Networks for Fast Image Translation(空间自适应像素网络，用于快速图像翻译)<br>
[paper](https://arxiv.org/abs/2012.02992) | [project](https://tamarott.github.io/ASAPNet_web/)<br><br>

[2] Image-to-image Translation via Hierarchical Style Disentanglement<br>
[paper](https://arxiv.org/abs/2103.01456) | [code](https://github.com/imlixinyang/HiSD) | [解读-层次风格解耦：人脸多属性篡改终于可控了](https://zhuanlan.zhihu.com/p/354258056)<br><br>

[1] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation(样式编码：用于图像到图像翻译的StyleGAN编码器)<br>
[paper](https://arxiv.org/abs/2008.00951) | [code](https://github.com/eladrich/pixel2style2pixel) | [project](https://eladrich.github.io/pixel2style2pixel/)<br><br>

<a name="IQA"/> 

### 图像质量评估(Image Quality Assessment)

[1] SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance(具有相似分布距离的无监督人脸图像质量评估)<br>
[paper](https://arxiv.org/abs/2103.05977)<br><br>

<a name="StyleTransfer"/> 

### 风格迁移(Style Transfer)

[6] Style-Aware Normalized Loss for Improving Arbitrary Style Transfer(一种改进任意风格转换的风格感知归一化损失算法)<br>
[paper](https://arxiv.org/abs/2104.10064)<br><br>

[5] Instagram Filter Removal on Fashionable Images(删除时尚图片上的Instagram滤镜)<br>
[paper](https://arxiv.org/abs/2104.05072)<br><br>

[4] Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer(起草和修订：拉普拉斯金字塔网络，用于快速高质量的艺术风格转移)<br>
[paper](https://arxiv.org/abs/2104.05376) | [code](https://github.com/PaddlePaddle/PaddleGAN/)<br><br>

[3] Rethinking and Improving the Robustness of Image Style Transfer(重新思考和改善图像风格迁移的鲁棒性)<br>
[paper](https://arxiv.org/abs/2104.05623)<br><br>

[2] ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows(通过可逆神经流进行无偏的图像风格迁移)<br>
[paper](https://arxiv.org/abs/2103.16877)<br><br>

[1] Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes(重新考虑风格迁移：从像素到参数化笔触)<br>
[paper](https://arxiv.org/abs/2103.17185)<br><br>

<br>

<a name="Face"/> 

## 人脸(Face)

[6] Continuous Face Aging via Self-estimated Residual Age Embedding(通过自我估计的残差年龄嵌入来实现连续的面部老化)<br>
[paper](https://arxiv.org/abs/2105.00020)<br><br>

[5] Towards High Fidelity Face Relighting with Realistic Shadows(逼真的阴影逼真的高保真面部)<br>
[paper](https://arxiv.org/abs/2104.00825)<br><br>

[4] Unsupervised Disentanglement of Linear-Encoded Facial Semantics(线性编码的面部语义的无监督解缠)<br>
[paper](https://arxiv.org/abs/2103.16605)<br><br>

[3] High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation(通过深度照明自适应实现AR / VR的高保真人脸跟踪)<br>
[paper](https://arxiv.org/abs/2103.15876) | [project](https://www.cs.rochester.edu/u/lchen63)<br><br>

[2] Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes(具有10^7个节点的大规模图上的结构感知人脸聚类)<br>
[paper](https://arxiv.org/abs/2103.13225) | [code&project](https://sstzal.github.io/STAR-FC/)<br><br>

[1] SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance(具有相似分布距离的无监督人脸图像质量评估)<br>
[paper](https://arxiv.org/abs/2103.05977)<br><br>

<a name="FacialRecognition"/> 

### 人脸识别/检测(Facial Recognition/Detection)

[12] Dynamic Class Queue for Large Scale Face Recognition In the Wild(野外大规模人脸识别的动态类队列)<br>
[paper](https://arxiv.org/abs/2105.11113) | [code](https://github.com/bilylee/DCQ)<br><br>

[11] Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition(特征分解与重构学习对有效的面部表情识别)<br>
[paper](https://arxiv.org/abs/2104.05160)<br><br>

[10] FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems(FACESEC：用于人脸识别系统的细粒度鲁棒性评估框架)<br>
[paper](https://arxiv.org/abs/2104.04107)<br><br>

[9] IronMask: Modular Architecture for Protecting Deep Face Template(用于保护深脸模板的模块化体系结构)<br>
[paper](https://arxiv.org/abs/2104.02239)<br><br>

[8] HLA-Face: Joint High-Low Adaptation for Low Light Face Detection(用于低光人脸检测的联合高低适应)<br>
[paper](https://arxiv.org/abs/2104.01984)  | [project](https://daooshee.github.io/HLA-Face-Website/)<br><br>

[7] Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition(潜入歧义：面部表情识别的潜在分布挖掘和成对不确定性估计)<br>
[paper](https://arxiv.org/abs/2104.00232)<br><br>

[6] Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition(情感过程：情感和面部表情识别的时态随机模型)<br>
[paper](https://arxiv.org/abs/2103.13372)<br><br>

[5] Cross-Domain Similarity Learning for Face Recognition in Unseen Domains(跨域相似性学习在未知领域中的人脸识别)<br>
[paper](https://arxiv.org/abs/2103.07503)<br><br>

[4] MagFace: A Universal Representation for Face Recognition and Quality Assessment(MagFace：人脸识别和质量评估的通用表示形式)<br>
[paper](https://arxiv.org/abs/2103.06627) | [code](https://github.com/IrvingMeng/MagFace)<br><br>

[3] CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement(用于模型不可知的面部检测细化的置信度排名)<br>
[paper](https://arxiv.org/abs/2103.07017)<br><br>

[2] A 3D GAN for Improved Large-pose Facial Recognition(用于改善大姿势面部识别的3D GAN)<br>
[paper](https://arxiv.org/pdf/2012.10545.pdf)<br><br>

[1] WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition(揭示了百万级深度人脸识别力量的基准测试)<br>
[paper](https://arxiv.org/abs/2103.04098) | [benchmark](https://www.face-benchmark.org/)<br><br>

<a name="FaceSynthesis"/> 

### 人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[14] Lifting 2D StyleGAN for 3D-Aware Face Generation(提升 2D StyleGAN 以生成 3D 感知人脸)<br>
[paper](https://arxiv.org/abs/2011.13126)<br><br>

[13] Monocular Real-time Full Body Capture with Inter-part Correlations(具有部分间相关性的单眼实时全身捕获)<br>
[paper](https://arxiv.org/abs/2012.06087)<br><br>

[12] Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation(基于隐式模块化视听表示的姿态可控人脸生成)<br>
[paper](https://arxiv.org/abs/2104.11116) | [code](https://hangz-nju-cuhk.github.io/projects/PC-AVS)<br><br>

[11] Audio-Driven Emotional Video Portraits(音频情感视频肖像)<br>
[paper](https://arxiv.org/abs/2104.07452)<br><br>

[10] Pixel Codec Avatars(像素编解码器头像)<br>
[paper](https://arxiv.org/abs/2104.04638)<br><br>

[9] Riggable 3D Face Reconstruction via In-Network Optimization(通过网络内优化进行可操纵的3D人脸重建)<br>
[paper](https://arxiv.org/abs/2104.03493) | [code](https://github.com/zqbai-jeremy/INORig)<br><br>

[8] Everything's Talkin': Pareidolia Face Reenactment(一切都在说话'：帕累多利亚脸部重现)<br>
[paper](https://arxiv.org/abs/2104.03061) | [project](https://wywu.github.io/projects/ETT/ETT.html)

[7] High-Fidelity and Arbitrary Face Editing(高保真和任意脸部编辑)<br>
[paper](https://arxiv.org/abs/2103.15814)<br><br>

[6] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)<br>
[paper](https://arxiv.org/pdf/2103.08204.pdf) | [project](https://qiuyuda.github.io/3DCaricShop/)<br><br>

[5] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis(进行全面伪造分析的多功能基准)<br>
[paper](https://arxiv.org/abs/2103.05630) | [code](https://yinanhe.github.io/projects/forgerynet.html)<br><br>

[4] Image-to-image Translation via Hierarchical Style Disentanglement(通过分层样式分解实现图像到图像的翻译)<br>
[paper](https://arxiv.org/abs/2103.01456) | [code](https://github.com/imlixinyang/HiSD)<br><br>

[3] When Age-Invariant Face Recognition Meets Face Age Synthesis: A  Multi-Task Learning Framework(当年龄不变的人脸识别遇到人脸年龄合成时：一个多任务学习框架)<br>
[paper](https://arxiv.org/abs/2103.01520) | [code](https://github.com/Hzzone/MTLFace)<br><br>

[2] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)<br>
[paper](https://arxiv.org/abs/2103.04023) | [code](https://github.com/Zhangjinso/PISE)<br><br>

[1] Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders(分析和改进自省变分自动编码器)<br>
[paper](https://arxiv.org/pdf/2012.13253.pdf) | [code](https://github.com/taldatech/soft-intro-vae-pytorch) | [project](https://taldatech.github.io/soft-intro-vae-web/)<br><br>

<a name="FaceAnti-Spoofing"/> 

### 人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[6] Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features(通过精确的几何特征提高假脸检测的效率和鲁棒性)<br>
[paper](https://arxiv.org/abs/2104.04480)<br><br>

[5] Face Forensics in the Wild(人脸伪造数据集)<br>
[paper](https://arxiv.org/abs/2103.16076) | [code](https://github.com/tfzhou/FFIW)<br><br>

[4] Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection(【人脸伪造检测】由单中心损失监督的频率感知判别特征学习，用于人脸伪造检测)<br>
[paper](https://arxiv.org/abs/2103.09096)<br><br>

[3] MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes(面罩引导的检测和重建，以防御深造假)<br>
[paper](https://arxiv.org/abs/2103.14211)<br><br>

[2] Cross Modal Focal Loss for RGBD Face Anti-Spoofing(跨模态焦点损失，用于RGBD人脸反欺骗)
[paper](https://arxiv.org/abs/2103.00948)<br><br>

[1] Multi-attentional Deepfake Detection(多注意的Deepfake检测)<br>
[paper](https://arxiv.org/abs/2103.02406)<br><br>



<br>

<a name="ObjectTracking"/> 

## 目标跟踪(Object Tracking)

[18] Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation(Alpha-Refine：通过精确的边界框估计来提高跟踪性能)<br>
[paper](https://arxiv.org/abs/2012.06815) | [code](https://github.com/MasterBin-IIAU/AlphaRefine)<br><br>

[17] LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search(LightTrack：通过一站式架构搜索找到用于跟踪对象的轻型神经网络)<br>
[paper](https://arxiv.org/abs/2104.14545) ｜ [code](https://github.com/researchmm/LightTrack)<br><br>

[16] Multiple Object Tracking with Correlation Learning(相关学习的多目标跟踪)<br>
[paper](https://arxiv.org/abs/2104.03541)<br><br>

[15] Learning to Track Instances without Video Annotations(学习在没有视频注释的情况下跟踪实例)<br>
[paper](https://arxiv.org/abs/2104.00287)<br><br>

[14] STMTrack: Template-free Visual Tracking with Space-time Memory Networks(具有时空存储网络的无模板视觉跟踪)<br>
[paper](https://arxiv.org/abs/2104.00324) | [code](https://github.com/fzh0917/STMTrack)<br><br>

[13] Online Multiple Object Tracking with Cross-Task Synergy(具有跨任务协同作用的在线多对象跟踪)<br>
[paper](https://arxiv.org/abs/2104.00380) | [code](https://github.com/songguocode/TADAM)<br><br>

[12] Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark(使用自然语言实现更灵活，准确的对象跟踪：算法和基准)<br>
[paper](https://arxiv.org/abs/2103.16746)<br><br>

[11] Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking(可学习的图匹配：将图分区与深度特征学习相结合以实现多对象跟踪)<br>
[paper](https://arxiv.org/abs/2103.16178) | [code](https://github.com/jiaweihe1996/GMTracker)<br><br>

[10] IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking(IoU攻击：针对视觉对象跟踪的临时相干黑盒对抗攻击)<br>
[paper](https://arxiv.org/abs/2103.14938) | [code](https://github.com/VISION-SJTU/IoUattack)<br><br>

[9] Transformer Tracking(Transformer跟踪)<br>
[paper](https://arxiv.org/abs/2103.15436) | [code](https://github.com/chenxin-dlut/TransT)<br><br>

[8] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking(Transformer与追踪器相遇：利用时间上下文进行可靠的视觉追踪)<br>
[paper](https://arxiv.org/pdf/2103.11681)<br><br>

[7] Track to Detect and Segment: An Online Multi-Object Tracker(跟踪检测和分段：在线多目标跟踪器)<br>
[paper](https://arxiv.org/abs/2103.08808) | [code](https://jialianwu.com/projects/TraDeS.html)<br><br>

[6] Learning a Proposal Classifier for Multiple Object Tracking(用于多对象跟踪的分类器)<br>
[paper](https://arxiv.org/abs/2103.07889) | [code](https://github.com/daip13/LPC_MOT.git)<br><br>

[5] Center-based 3D Object Detection and Tracking(基于中心的3D目标检测和跟踪)<br>
[paper](https://arxiv.org/abs/2006.11275) | [code](https://github.com/tianweiy/CenterPoint)<br><br>

[4] HPS: localizing and tracking people in large 3D scenes from wearable sensors(通过可穿戴式传感器对大型3D场景中的人进行定位和跟踪)<br><br>

[3] Track to Detect and Segment: An Online Multi-Object Tracker(跟踪检测和分段：在线多对象跟踪器)<br>
[project](https://jialianwu.com/projects/TraDeS.html) | [video](https://www.youtube.com/watch?v=oGNtSFHRZJA)<br><br>

[2] Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking(多目标跟踪的概率小波计分和修复)<br>
[paper](https://arxiv.org/abs/2012.02337)<br><br>

[1] Rotation Equivariant Siamese Networks for Tracking（旋转等距连体网络进行跟踪）<br>
[paper](https://arxiv.org/abs/2012.13078)<br><br>


<br><br>
<a name="ImageRetrieval"/> 

## 图像&视频检索/理解(Image&Video Retrieval/Video Understanding)


[11] VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval(VIGOR：超越一对一检索的交叉视图图像地理定位)<br>
[paper](https://arxiv.org/abs/2011.12172) | [dataset&project](https://github.com/Jeff-Zilence/VIGOR)<br><br>

[10] Compatibility-aware Heterogeneous Visual Search(兼容感知的异构视觉搜索)<br>
[paper](https://arxiv.org/abs/2105.06047)<br><br>

[9] 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition(2D还是2D？ 自适应3D卷积选择以实现有效的视频识别)<br>
[paper](https://arxiv.org/abs/2012.14950)<br><br>

[8] FrameExit: Conditional Early Exiting for Efficient Video Recognition(【视频理解】帧退出：有条件提前退出以实现有效的视频识别)<br>
[paper](https://arxiv.org/abs/2104.13400)<br><br>

[7] T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval(T2VLAD：用于文本视频检索的全局局部序列比对)<br>
[paper](https://arxiv.org/abs/2104.10054)<br><br>

[6] Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers(快速思考和缓慢思考：使用变压器进行高效的文本到视觉检索)<br>
[paper](https://arxiv.org/abs/2103.16553)<br><br>

[5] StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval(StyleMeUp：迈向与风格无关的基于草图的图像检索)<br>
[paper](https://arxiv.org/abs/2103.15706)<br><br>

[4] More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval(您只需要更多照片：基于半监督学习的细粒度基于草图的图像检索)<br>
[paper](https://arxiv.org/abs/2103.13990) | [code](https://github.com/AyanKumarBhunia/semisupervised-FGSBIR)<br><br>

[3] Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning(使用分层Transformer和自我监督学习改进跨模态食谱检索)<br>
[paper](https://arxiv.org/pdf/2103.13061.pdf)<br><br>

[2] On Semantic Similarity in Video Retrieval(视频检索中的语义相似度)<br>
[paper](https://arxiv.org/abs/2103.10095) ｜ [code](https://mwray.github.io/SSVR/)<br><br>

[1] QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval(实用的查询高效的图像检索黑盒攻击)<br>
[paper](https://arxiv.org/abs/2103.02927)<br><br>



<a name="ActionRecognition"/> 

### 行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[25] Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization(通过对比交叉视图互信息最大化学习视图解开人体姿势表示)<br>
[paper](https://arxiv.org/abs/2012.01405) | [code](https://github.com/google-research/google-research/tree/master/poem)<br><br>

[24] Anticipating human actions by correlating past with the future with Jaccard similarity measures(用Jaccard相似性测度预测人类行为)<br>
[paper](https://arxiv.org/abs/2105.12414)<br><br>

[23] Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation(语言查询视频演员分割的协作时空建模)<br>
[paper](https://arxiv.org/abs/2105.06818)<br>

[22] Home Action Genome: Cooperative Compositional Action Understanding(家庭行动基因组：合作组成行动的理解)<br>
[paper](https://arxiv.org/abs/2105.05226)<br><br>

[21] Weakly Supervised Action Selection Learning in Video(视频中的弱监督动作选择学习)<br>
[paper](https://arxiv.org/abs/2105.02439) | [code](https://github.com/layer6ai-labs/ASL)<br><br>

[20] Global2Local: Efficient Structure Search for Video Action Segmentation()<br>
[paper](https://arxiv.org/abs/2101.00910) | [code](https://github.com/ShangHua-Gao/G2L-search)<br><br>

[19] Self-Supervised Learning for Semi-Supervised Temporal Action Proposal(自我监督学习的半监督时间行动建议)<br>
[paper](https://arxiv.org/abs/2104.03214)<br><br>

[18] Anchor-Constrained Viterbi for Set-Supervised Action Segmentation(锚约束维特比用于集合监督的动作分割)<br>
[paper](https://arxiv.org/abs/2104.02113)<br><br>

[17] Action Shuffle Alternating Learning for Unsupervised Action Segmentation(动作洗牌交替学习，实现无监督动作分割)<br>
[paper](https://arxiv.org/abs/2104.02116)<br><br>

[16] Self-supervised Motion Learning from Static Images(从静态图像进行自我监督的运动学习)<br>
[paper](https://arxiv.org/abs/2104.00240)<br><br>

[15] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning(带有片段对比学习的弱监督实时动作定位)
[paper](https://arxiv.org/abs/2103.16392)<br><br>

[14] Recognizing Actions in Videos from Unseen Viewpoints(从看不见的角度识别视频中的动作)<br>
[paper](https://arxiv.org/abs/2103.16516)<br><br>

[13] No frame left behind: Full Video Action Recognition(没有残影：完整的视频动作识别)<br>
[paper](https://arxiv.org/abs/2103.15395)<br><br>

[12] Learning Salient Boundary Feature for Anchor-free Temporal Action Localization(学习显着边界特征以实现无锚时间动作定位)<br>
[paper](https://arxiv.org/abs/2103.13137) | [code](https://github.com/TencentYoutuResearch/ActionDetection-AFSD)<br><br>

[11] Temporal Context Aggregation Network for Temporal Action Proposal Refinement(时间上下文聚合网络，用于改进时间行动建议)<br>
[paper](https://arxiv.org/abs/2103.13141)<br><br>

[10] The Blessings of Unlabeled Background in Untrimmed Videos(未修饰视频中未标记背景的祝福)<br>
[paper](https://arxiv.org/abs/2103.13183)<br><br>

[9] Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation(临时加权层次聚类，实现无监督动作分割)<br>
[paper](https://arxiv.org/abs/2103.11264) | [code](https://github.com/ssarfraz/FINCH-Clustering/tree/master/TW-FINCH)<br><br>

[8] Coarse-Fine Networks for Temporal Activity Detection in Videos(粗细网络，用于视频中的时间活动检测)<br>
[paper](https://arxiv.org/abs/2103.01302)<br><br>

[7] Learning Discriminative Prototypes with Dynamic Time Warping(通过动态时间扭曲学习判别性原型)<br>
[paper](https://arxiv.org/pdf/2103.09458.pdf)<br><br>

[6] Temporal Action Segmentation from Timestamp Supervision(时间监督中的时间动作分割)<br>
[paper](https://arxiv.org/abs/2103.06669)<br><br>

[5] ACTION-Net: Multipath Excitation for Action Recognition(用于动作识别的多路径激励)<br>
[paper](https://arxiv.org/abs/2103.07372) ｜ [code](https://github.com/V-Sense/ACTION-Net)<br><br>

[4] BASAR:Black-box Attack on Skeletal Action Recognition(骨骼动作识别的黑匣子攻击)<br>
[paper](https://arxiv.org/abs/2103.05266)<br><br>

[3] Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack(了解对抗攻击下基于骨骼的动作识别的鲁棒性)<br>
[paper](https://arxiv.org/pdf/2103.05347.pdf)<br><br>

[2] Temporal Difference Networks for Efficient Action Recognition(用于有效动作识别的时差网络)<br>
[paper](https://arxiv.org/abs/2012.10071) | [code](https://github.com/MCG-NJU/TDN)<br><br>

[1] Behavior-Driven Synthesis of Human Dynamics(行为驱动的人类动力学综合)<br>
[paper](https://arxiv.org/pdf/2103.04677.pdf) | [code](https://compvis.github.io/behavior-driven-video-synthesis/)<>


<a name="Re-Identification"/> 

### 行人重识别/检测(Re-Identification/Detection)

[15] Generalizable Person Re-identification with Relevance-aware Mixture of Experts(具有相关性感知混合专家的可泛化的行人重识别)<br>
[paper](https://arxiv.org/abs/2105.09156)<br><br>

[14] Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification(通过基于记忆的多源元学习来学习概括看不见的域以进行人员重新识别)<br>
[paper](https://arxiv.org/abs/2012.00417)<br><br>

[13] Joint Generative and Contrastive Learning for Unsupervised Person Re-identification(联合生成和对比学习，用于无监督人员重新识别)<br>
[paper](https://arxiv.org/abs/2012.09071) | [code](https://github.com/chenhao2345/GCL)<br><br>

[12] Unsupervised Pre-training for Person Re-identification(对人员进行重新识别的无监督预训练)<br>
[paper](https://arxiv.org/abs/2012.03753)<br><br>

[11] BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification(BiCnet-TKS：学习有效的时空表示以重新识别视频人)<br>
[paper](https://arxiv.org/abs/2104.14783)<br><br>

[10] Unsupervised Multi-Source Domain Adaptation for Person Re-Identification(用于行人重新识别的无监督多源域适配)<br>
[paper](https://arxiv.org/abs/2104.12961)<br><br>

[9] Combined Depth Space based Architecture Search For Person Re-identification(基于组合深度空间的架构搜索以进行行人重识别)<br>
[paper](https://arxiv.org/abs/2104.04163)<br><br>

[8] Neural Feature Search for RGB-Infrared Person Re-Identification(神经特征搜索以重新识别RGB红外人)<br>
[paper](https://arxiv.org/abs/2104.02366)<br><br>

[7] Group-aware Label Transfer for Domain Adaptive Person Re-identification(组感知标签传输，用于域自适应行人重识别)<br>
[paper](https://arxiv.org/abs/2103.12366)<br><br>

[6] Lifelong Person Re-Identification via Adaptive Knowledge Accumulation(通过自适应知识积累对终身行人重识别)<br>
[paper](https://arxiv.org/abs/2103.12462)<br><br>

[5] Anchor-Free Person Search(Anchor-Free行人搜索)<br>
[paper](https://arxiv.org/abs/2103.11617) | [code](https://github.com/daodaofr/AlignPS)<br><br>

[4] Intra-Inter Camera Similarity for Unsupervised Person Re-Identification(摄像机内部相似度用于无监督人员重新识别)<br>
[paper](https://arxiv.org/abs/2103.11658)<br><br>

[3] Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification(基于视频的人员重新识别的全球指导对等学习)<br>
[paper](https://arxiv.org/abs/2103.04337)<br><br>

[2] Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification(联合抗噪学习和元相机移位自适应，用于无监督人员的重新识别)<br>
[paper](https://arxiv.org/abs/2103.04618)<br><br>

[1] Meta Batch-Instance Normalization for Generalizable Person Re-Identification(通用批处理人员重新标识的元批实例规范化)<br>
[paper](https://arxiv.org/abs/2011.14670)<br><br>

<a name="VideoCaption"/> 

### 图像/视频字幕(Image/Video Caption)

[7] Towards Accurate Text-based Image Captioning with Content Diversity Exploration(借助内容多样性探索实现精确的基于文本的图像字幕)<br>
[paper](https://arxiv.org/abs/2105.03236)<br><br>

[6] Human-like Controllable Image Captioning with Verb-specific Semantic Roles(具有动词特定语义作用的类人可控图像字幕)<br>
[paper](https://arxiv.org/abs/2103.12204) | [code](https://github.com/mad-red/VSR-guided-CIC)<br><br>

[5] Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos(语义注意的共同接地网络，用于引用视频中的表达理解)<br>
[paper](https://arxiv.org/pdf/2103.12346.pdf) | [project](https://sijiesong.github.io/co-grounding)<br><br>

[4] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles(多实例字幕：从组织病理学教科书和文章中学习表示形式)<br>
[paper](https://arxiv.org/pdf/2103.05121.pdf)<br><br>

[3] Open-book Video Captioning with Retrieve-Copy-Generate Network(带有检索复制生成网络的开卷视频字幕)<br>
[paper](https://arxiv.org/pdf/2103.05284.pdf)<br><br>

[2] VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs(基于视频的文本生成的端到端学习来自多模式输入)<br>
[paper](https://arxiv.org/pdf/2101.12059.pdf)<br><br>

[1] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans(：RGB-D扫描中的上下文感知密集字幕)
[paper](https://arxiv.org/abs/2012.02206) | [code](https://github.com/daveredrum/Scan2Cap) | [project](https://daveredrum.github.io/Scan2Cap/) | [video](https://youtu.be/AgmIpDbwTCY)<br><br>


<br>

<a name="MedicalImaging"/> 

## 医学影像(Medical Imaging)

[14] DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images(DARCNN：用于生物医学图像中无监督实例分割的基于域自适应区域的卷积神经网络)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.pdf)<br><br>

[13] Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation(每种注释都至关重要：医学图像分割的多标签深度监管)<br>
[paper](https://arxiv.org/abs/2104.13243)<br><br>

[12] DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation(DiNTS：用于3D医学图像分割的可区分神经网络拓扑搜索)<br>
[paper](https://arxiv.org/abs/2103.15954)<br><br>

[11] Confluent Vessel Trees with Accurate Bifurcations(分叉的融合容器树)
[paper](https://arxiv.org/abs/2103.14268)<br><br>

[10] Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net(无监督多元规范CSCℓ4Net的脑图像合成)<br>
[paper](https://arxiv.org/pdf/2103.11587.pdf)<br><br>

[9] XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations(使用全局和局部解释诊断胸部X光片)<br>
[paper](https://arxiv.org/pdf/2103.10663.pdf)<br><br>

[8] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space(在连续频率空间中通过情景学习进行医学图像分割的联合域泛化)<br>
[paper](https://arxiv.org/abs/2103.06030) | [code](https://github.com/liuquande/FedDG-ELCFS)<br><br>

[7] Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles(多实例字幕：从组织病理学教科书和文章中学习表示形式)<br>
[paper](https://arxiv.org/pdf/2103.05121.pdf)<br><br>

[6] Discovering Hidden Physics Behind Transport Dynamics(在运输动力学背后发现隐藏物理)<br>
[paper](https://arxiv.org/abs/2011.12222)<br><br>

[5] DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on  Cardiac Tagging Magnetic Resonance Images(一种心脏标记磁共振图像运动跟踪的无监督深度学习方法)<br>
[paper](https://arxiv.org/abs/2103.02772)<br><br>

[4] Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning(多机构协作改进基于深度学习的联合学习磁共振图像重建)<br>
[paper](https://arxiv.org/abs/2103.02148) | [code](https://github.com/guopengf/FLMRCM)<br><br>

[3] 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management(用于胰腺肿块分割，诊断和定量患者管理的3D图形解剖学几何集成网络)<br><br>

[2] Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies(深部病变追踪器：在4D纵向成像研究中监控病变)<br>
[paper](https://arxiv.org/abs/2012.04872)<br><br>

[1] Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization(通过脊柱矫正和解剖学约束优化在CT中自动进行椎骨定位和识别)<br>
[paper](https://arxiv.org/abs/2012.07947)<br><br>

<br>

<a name="TDR"/> 


## 文本检测/识别(Text Detection/Recognition)

[7] TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text(TextOCR：对任意形状的场景文本进行大规模的端到端推理)<br>
[paper](https://arxiv.org/abs/2105.05486) | [project](https://textvqa.org/textocr)<br><br>

[6] Fourier Contour Embedding for Arbitrary-Shaped Text Detection(基于Fourier轮廓嵌入的任意形状文本检测)<br>
[paper](https://arxiv.org/abs/2104.10442)<br><br>

[5] Scene Text Retrieval via Joint Text Detection and Similarity Learning(通过联合文本检测和相似性学习检索场景文本)<br>
[paper](https://arxiv.org/abs/2104.01552) | [code](https://github.com/lanfeng4659/STR-TDSL)<br><br>

[4] MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition(迈向写作者自适应的手写文本识别)<br>
[paper](https://arxiv.org/abs/2104.01876)<br><br>

[3] MOST: A Multi-Oriented Scene Text Detector with Localization Refinement(具有本地化优化功能的多方位场景文本检测器)<br>
[paper](https://arxiv.org/abs/2104.01070)<br><br>

[2] Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition(像人类一样阅读：用于场景文本识别的自主，双向和迭代语言建模)<br>
[paper](https://arxiv.org/abs/2103.06495) | [code](https://github.com/FangShancheng/ABINet)<br><br>

[1] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels(如果我们仅将真实数据集用于场景文本识别该怎么办？ 带有较少标签的场景文本识别)<br>
[paepr](https://arxiv.org/abs/2103.04400) | [code](https://github.com/ku21fan/STR-Fewer-Labels)<br><br>

<br>

<a name="RSI"/> 

## 遥感图像(Remote Sensing Image)

[3] SIPSA-Net: Shift-Invariant Pan Sharpening with Moving Object Alignment for Satellite Imagery(SIPSA-Net：带有移动目标对准的卫星图像平移不变锐化)<br>
[paper](https://arxiv.org/abs/2105.02400)<br><br>

[2] PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation(语义流经点以进行航空图像分割)<br>
[paper](https://arxiv.org/pdf/2103.06564.pdf)<br><br>

[1] Deep Gradient Projection Networks for Pan-sharpening(【超分辨率】泛锐化的深梯度投影网络)<br>
[paper](https://arxiv.org/pdf/2103.04584.pdf) | [code](https://github.com/xsxjtu/GPPNN)<br><br>


<br>

<a name="GAN"/> 

## GAN/生成式/对抗式(GAN/Generative/Adversarial)

[27] Enhancing the Transferability of Adversarial Attacks through Variance Tuning(通过方差调整增强对抗性攻击的可转移性)<br>
[paper](https://arxiv.org/abs/2103.15571) | [code](https://github.com/JHL-HUST/VT)<br><br>

[26] HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms(HistoGAN：通过颜色直方图控制 GAN 生成的图像和真实图像的颜色)<br>
[paper](https://arxiv.org/abs/2011.11731)<br><br>

[25] Continuous Face Aging via Self-estimated Residual Age Embedding(通过自我估计的残差年龄嵌入来实现连续的面部老化)<br>
[paper](https://arxiv.org/abs/2105.00020)<br><br>

[24] Unsupervised 3D Shape Completion through GAN Inversion(通过GAN反演实现无监督3D形状补全)<br>
[paper](https://arxiv.org/abs/2104.13366) | [project](https://junzhezhang.github.io/projects/ShapeInversion/)<br><br>

[23] Delving into Data: Effectively Substitute Training for Black-box Attack(深入研究数据：有效替代黑盒攻击的培训)<br>
[paper](https://arxiv.org/abs/2104.12378)<br><br>

[22] LAFEAT: Piercing Through Adversarial Defenses with Latent Features(LAFEAT：通过具有潜在功能的对抗性防御突围)<br>
[paper](https://arxiv.org/abs/2104.09284)<br><br>

[21] Surrogate Gradient Field for Latent Space Manipulation(潜在空间操纵的替代梯度场)<br>
[paper](https://arxiv.org/abs/2104.09065)<br><br>

[20] DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort(DatasetGAN：只需最少的人力即可获得的高效标签数据工厂)<br>
[paper](https://arxiv.org/abs/2104.06490)<br><br>

[19] Regularizing Generative Adversarial Networks under Limited Data(在有限数据下对生成性对抗网络进行正则化)<br>
[paper](https://arxiv.org/abs/2104.03310) | [project](https://hytseng0509.github.io/lecam-gan) | [code](https://github.com/google/lecam-gan)<br><br>

[18] Content-Aware GAN Compression(内容感知GAN压缩)<br>
[paper](https://arxiv.org/abs/2104.02244)<br><br>

[17] Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer(口红还不够：超出配色范围的野外化妆效果)<br>
[paper](https://arxiv.org/abs/2104.01867) | [code](https://github.com/VinAIResearch/CPM)<br><br>

[16] LiBRe: A Practical Bayesian Approach to Adversarial Detection(LiBRe：对抗性检测的实用贝叶斯方法)<br>
[paper](https://arxiv.org/abs/2103.14835)<br><br>

[15] DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network(通过对比生成对抗网络进行多种条件图像合成)<br>
[paper](https://arxiv.org/abs/2103.07893)<br><br>

[14] Diverse Semantic Image Synthesis via Probability Distribution Modeling(基于概率分布建模的多种语义图像合成)<br>
[paper](https://arxiv.org/abs/2103.06878) | [code](https://github.com/tzt101/INADE.git)<br><br>

[13] HumanGAN: A Generative Model of Humans Images(人类图像的生成模型)<br>
[paper](https://arxiv.org/abs/2103.06902)<br><br>

[12] MetaSimulator: Simulating Unknown Target Models for Query-Efficient Black-box Attacks(模拟未知目标模型以提高查询效率的黑盒攻击)<br>
[paper](https://arxiv.org/abs/2009.00960) | [code](https://github.com/machanic/MetaSimulator)<br><br>

[11] Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders(分析和改进自省变分自动编码器)<br>
[paper](https://arxiv.org/pdf/2012.13253.pdf) | [code](https://github.com/taldatech/soft-intro-vae-pytorch) | [project](https://taldatech.github.io/soft-intro-vae-web/)<br><br>

[10] LOHO: Latent Optimization of Hairstyles via Orthogonalization(LOHO：通过正交化潜在地优化发型)<br>
[paper](https://arxiv.org/pdf/2103.03891.pdf)<br><br>

[9] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)<br>
[paper](https://arxiv.org/abs/2103.04023) | [code](https://github.com/Zhangjinso/PISE)<br><br>

[8] Closed-Form Factorization of Latent Semantics in GANs(GAN中潜在语义的闭式分解)<br>
[paper](https://arxiv.org/abs/2007.06600) | [code](https://github.com/genforce/sefa)<br><br>

[7] PD-GAN: Probabilistic Diverse GAN for Image Inpainting(用于图像修复的概率多样GAN)<br><br>

[6] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)<br>
[paper](https://arxiv.org/abs/2103.03243) | [code](https://github.com/mit-han-lab/anycost-gan)<br><br>

[5] Efficient Conditional GAN Transfer with Knowledge Propagation across Classes(高效的有条件GAN转移以及跨课程的知识传播)<br>
[paper](https://arxiv.org/pdf/2102.06696.pdf) | [code](http://github.com/mshahbazi72/cGANTransfer)<br><br>

[4] Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing（利用GAN中潜在的空间维度进行实时图像编辑）<br><br>

[3] Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs(Hijack-GAN：意外使用经过预训练的黑匣子GAN)<br>
[paper](https://arxiv.org/pdf/2011.14107.pdf)<br><br>

[2] Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation(样式编码：用于图像到图像翻译的StyleGAN编码器)<br>
[paper](https://arxiv.org/abs/2008.00951) | [code](https://github.com/eladrich/pixel2style2pixel) | [project](https://eladrich.github.io/pixel2style2pixel/)<br><br>

[1] A 3D GAN for Improved Large-pose Facial Recognition(用于改善大姿势面部识别的3D GAN)<br>
[paper](https://arxiv.org/pdf/2012.10545.pdf)<br><br>


<br>

<a name="IGIS"/> 

### 图像生成/图像合成(Image Generation/Image Synthesis)

[26] Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On(通过虚拟试穿的生成式3D服装模型进行自我监督的碰撞处理)<br>
[paper](https://arxiv.org/abs/2105.06462) ｜ [project](http://mslab.es/projects/SelfSupervisedGarmentCollisions)<br><br>

[25] Roof-GAN: Learning to Generate Roof Geometry and Relations for Residential Houses(Roof-GAN：学习为住宅生成屋顶几何图形和关系)<br>
[paper](https://arxiv.org/abs/2012.09340) | [code](https://github.com/yi-ming-qian/roofgan)<br><br>

[24] TediGAN: Text-Guided Diverse Face Image Generation and Manipulation(TediGAN：文本引导的多样面部图像生成和操纵)<br>
[paper](https://arxiv.org/abs/2012.03308) | [code](https://github.com/weihaox/TediGAN) | [video](https://youtu.be/L8Na2f5viAM)<br><br>

[23] GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving(GeoSim：通过可自动驾驶的几何感知合成进行逼真的视频模拟)<br>
[paper](https://arxiv.org/abs/2101.06543)<br><br>

[22] GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields(GIRAFFE：将场景表示为合成的生成神经特征场)<br>
[paper](https://arxiv.org/abs/2011.12100) | [project](http://bit.ly/giraffe-project)<br><br>

[21] Ensembling with Deep Generative Views(融入深刻的生成观点)<br>
[paper](https://arxiv.org/abs/2104.14551) | [code](https://github.com/chail/gan-ensembling)<br><br>

[20] StylePeople: A Generative Model of Fullbody Human Avatars(StylePeople：全身人类化身的生成模型)<br>
[paper](https://arxiv.org/abs/2104.08363) | [code](http://saic-violet.github.io/style-people)<br>

[19] See through Gradients: Image Batch Recovery via GradInversion(透视渐变：通过GradInversion恢复图像批处理)<br>
[paper](https://arxiv.org/abs/2104.07586)<br><br>

[18] StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis(StEP：用于多模式图像合成的基于样式的编码器预训练)<br>
[paper](https://arxiv.org/abs/2104.07098)<br><br>

[17] Few-shot Image Generation via Cross-domain Correspondence(通过跨域对应小样本图像生成)<br>
[paper](https://arxiv.org/abs/2104.06820)<br><br>

[16] IMAGINE: Image Synthesis by Image-Guided Model Inversion(想象：通过图像指导模型反演的图像合成)<br>
[paper](https://arxiv.org/abs/2104.05895)<br><br>

[15] Variational Transformer Networks for Layout Generation(用于布局生成的变电站网络)<br>
[paper](https://arxiv.org/abs/2104.02416)<br><br>

[14] VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization(通过未对准感知的归一化进行高分辨率的虚拟试戴)<br>
[paper](https://arxiv.org/abs/2103.16874)<br><br>

[13] A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection(仔细研究CNN生成图像检测的傅立叶光谱差异)<br>
[paper](https://arxiv.org/abs/2103.17195) | [code](https://keshik6.github.io/Fourier-Discrepancies-CNN-Detection/)<br><br>

[12] Semi-supervised Synthesis of High-Resolution Editable Textures for 3D Humans(用于3D人类的高分辨率可编辑纹理的半监督合成)<br>
[paper](https://arxiv.org/abs/2103.17266)<br><br>

[11] Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling(个性化几何和纹理建模的少量人体运动传递)<br>
[paper](https://arxiv.org/abs/2103.14338) | [code](https://github.com/HuangZhiChao95/FewShotMotionTransfer)<br><br>

[10] Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net(无监督多元规范CSCℓ4Net的脑图像合成)<br>
[paper](https://arxiv.org/pdf/2103.11587.pdf)<br><br>

[9] Context-Aware Layout to Image Generation with Enhanced Object Appearance(具有增强的对象外观的上下文感知布局到图像生成)<br>
[paper](https://arxiv.org/abs/2103.11897)<br><br>

[8] DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network(通过对比生成对抗网络进行多种条件图像合成)<br>
[paper](https://arxiv.org/abs/2103.07893)<br><br>

[7] HumanGAN: A Generative Model of Humans Images(人类图像的生成模型)<br>
[paper](https://arxiv.org/abs/2103.06902)<br><br>

[6] PISE: Person Image Synthesis and Editing with Decoupled GAN(使用分离的GAN进行人像合成和编辑)<br>
[paper](https://arxiv.org/abs/2103.04023) | [code](https://github.com/Zhangjinso/PISE)<br><br>

[5] SMPLicit: Topology-aware Generative Model for Clothed People(穿衣服的人的拓扑感知生成模型)<br>
[paper](https://arxiv.org/pdf/2103.06871.pdf) | [code](http://www.iri.upc.edu/people/ecorona/smplicit/)<br><br>

[4] Diversifying Sample Generation for Data-Free Quantization（多样化的样本生成，实现无数据量化）<br>
[paper](https://arxiv.org/abs/2103.01049)<br><br>

[3] Diverse Semantic Image Synthesis via Probability Distribution Modeling(基于概率分布建模的多种语义图像合成)<br>
[paper](https://arxiv.org/abs/2103.06878) | [code](https://github.com/tzt101/INADE.git)<br><br>

[2] When Age-Invariant Face Recognition Meets Face Age Synthesis: A  Multi-Task Learning Framework(当年龄不变的人脸识别遇到人脸年龄合成时：一个多任务学习框架)<br>
[paper](https://arxiv.org/abs/2103.01520) | [code](https://github.com/Hzzone/MTLFace)<br><br>

[1] Anycost GANs for Interactive Image Synthesis and Editing(用于交互式图像合成和编辑的AnyCost Gans)<br>
[paper](https://arxiv.org/abs/2103.03243) | [code](https://github.com/mit-han-lab/anycost-gan)<br><br>

<a name="ViewSynthesis"/> 

### 视图合成(View Synthesis)

[7] Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes(用于动态场景时空视图合成的神经场景流场)<br>
[paper](https://arxiv.org/abs/2011.13084) | [project](http://www.cs.cornell.edu/~zl548/NSFF/)<>

[6] Stable View Synthesis(稳定的视图合成)<br>
[paper](https://arxiv.org/abs/2011.07233) | [code](https://github.com/intel-isl/StableViewSynthesis)<br><br>

[5] Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes(立体辐射场（SRF）：学习新颖的场景的稀疏视图的视图合成)<br>
[paper](https://arxiv.org/abs/2104.06935) | [project](https://virtualhumans.mpi-inf.mpg.de/srf/)<br><br>

[4] Layout-Guided Novel View Synthesis from a Single Indoor Panorama(单一室内全景的布局引导式新颖视图合成)<br>
[paper](https://arxiv.org/abs/2103.17022) | [project](https://github.com/bluestyle97/PNVS)<br><br>

[3] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX：具有神经基础扩展的实时视图合成)<br>
[paper](https://arxiv.org/abs/2103.05606) | [code](https://nex-mpi.github.io/)<br><br>

[2] ID-Unet: Iterative Soft and Hard Deformation for View Synthesis(视图合成的迭代软硬变形)<br>
[paper](https://arxiv.org/abs/2103.02264)<br><br>

[1] Self-Supervised Visibility Learning for Novel View Synthesis(自我监督的可视性学习，用于新颖的视图合成)<br>
[paper](https://arxiv.org/abs/2103.15407)<br><br>

<br>

<a name="3DVision"/> 

## 三维视觉(3D Vision)

[8] 3D Spatial Recognition without Spatially Labeled 3D(没有空间标记的3D的3D空间识别)<br>
[paper](https://arxiv.org/abs/2105.06461)<br><br>

[7] Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations(等值点：使用混合表示优化神经隐式曲面)<br>
[paper](https://arxiv.org/abs/2012.06434) | [code](https://github.com/yifita/iso-points)<br><br>

[6] Learning Feature Aggregation for Deep 3D Morphable Models(深度3D可变形模型的学习特征聚合)<br>
[paper](https://arxiv.org/abs/2105.02173)<br><br>

[5] Deep Polarization Imaging for 3D shape and SVBRDF Acquisition(用于3D形状和SVBRDF采集的深偏振成像)<br>
[paper](https://arxiv.org/abs/2105.02875)<br><br>

[4] Unsupervised 3D Shape Completion through GAN Inversion(通过GAN反演实现无监督3D形状补全)<br>
[paper](https://arxiv.org/abs/2104.13366) | [project](https://junzhezhang.github.io/projects/ShapeInversion/)<br><br>

[3] KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control(【3D关键点】关键点变形器：用于形状控制的无监督三维关键点发现)<br>
[paper](https://arxiv.org/abs/2104.11224) | [project](http://tomasjakab.github.io/KeypointDeformer)<br><br>

[2] A Deep Emulator for Secondary Motion of 3D Characters(三维角色二次运动的深度仿真器)
[paper](https://arxiv.org/abs/2103.01261)

[1] 3D CNNs with Adaptive Temporal Feature Resolutions(具有自适应时间特征分辨率的3D CNN)<br>
[paper](https://arxiv.org/abs/2011.08652)<br><br>

<a name="3DPC"/> 

### 点云(Point Cloud)

[32] Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation([点云分割]用于 LiDAR 分割的圆柱形和非对称 3D 卷积网络)<br>
[paper](https://arxiv.org/abs/2011.10033) ｜ [code](https://github.com/xinge008/Cylinder3D)<br><br>

[31] DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution(DyCo3D：通过动态卷积对 3D 点云进行稳健的实例分割)<br>
[paper](https://arxiv.org/abs/2011.13328) | [code](https://git.io/DyCo3D)<br><br>

[30] PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds(PV-RAFT：用于点云场景流估计的点体素相关字段)<br>
[paper](https://arxiv.org/abs/2012.00987)<br><br>

[29] PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization(PWCLO-Net：使用分层嵌入掩码优化的 3D 点云中的深度激光雷达测距)<br>
[paper](https://arxiv.org/abs/2012.00972) | [code](https://github.com/IRMVLab/PWCLONet)<br><br>

[28] Omni-supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning(通过渐进感受野分量推理的全方位监督点云分割)<br>
[paper](https://arxiv.org/abs/2105.10203) | [code](https://github.com/azuki-miho/RFCR)<br><br>

[27] PMP-Net: Point Cloud Completion by Learning Multi-step Point Moving Paths(PMP-Net：通过学习多步点移动路径来完成点云)<br>
[paper](https://arxiv.org/abs/2012.03408)<br><br>

[26] VoxelContext-Net: An Octree based Framework for Point Cloud Compression(VoxelContext-Net：基于Octree的点云压缩框架)<br>
[paper](https://arxiv.org/abs/2105.02158)<br><br>

[25] Variational Relational Point Completion Network(变分关系点完备网络)<br>
[paper](https://arxiv.org/abs/2104.10154) | [project](https://paul007pl.github.io/projects/VRCNet.html)<br><br>

[24] SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements(规模：使用关节局部元素的表面编解码器模拟穿衣服的人)<br>
[paper](https://arxiv.org/abs/2104.07660) | [code](https://qianlim.github.io/SCALE)<br><br>

[23] RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2D-Tree Representation(RPSRNet：使用Barnes-Hut二维树表示法的端到端可训练刚性点集配准网络)<br>
[paper](https://arxiv.org/abs/2104.05328)<br><br>

[22] View-Guided Point Cloud Completion(视图引导的点云完成)<br>
[paper](https://arxiv.org/abs/2104.05666)<br><br>

[21] DeepI2P: Image-to-Point Cloud Registration via Deep Classification(通过深度分类的图像到点云配准)<br>
[paper](https://arxiv.org/abs/2104.03501) | [code](https://github.com/lijx10/DeepI2P)<br><br>

[20] FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds(FESTA：场景点云通过时空注意进行光流估计)<br>
[paper](https://arxiv.org/abs/2104.00798)<br><br>

[19] Denoise and Contrast for Category Agnostic Shape Completion(类别不可知形状完成的消噪和对比度)<br>
[paper](https://arxiv.org/abs/2103.16671)<br><br>

[18] Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation(无提案的LiDAR点云全景分割)<br>
[paper](https://arxiv.org/pdf/2103.14962.pdf)<br><br>

[17] ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning(ReAgent：使用模仿和强化学习进行点云配准)<br>
[paper](https://arxiv.org/abs/2103.15231)<br><br>

[16] Equivariant Point Network for 3D Point Cloud Analysis(等变点网络进行3D点云分析)<br>
[paper](https://arxiv.org/abs/2103.14147)<br><br> 

[15] PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds(PAConv：点云上具有动态内核组装的位置自适应卷积)<br>
[paper](https://arxiv.org/abs/2103.14635) | [code](https://github.com/CVMI-Lab/PAConv)<br><br> 

[14] Skeleton Merger: an Unsupervised Aligned Keypoint Detector(骨架合并：无监督的对准关键点检测器)<br>
[paper](https://arxiv.org/pdf/2103.10814.pdf) | [code](https://github.com/eliphatfs/SkeletonMerger)<br><br>

[13] Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding(使用缺失区域编码的循环变换完成不成对的点云)<br>
[paper](https://arxiv.org/abs/2103.07838)<br><br>

[12] Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion(通过双边扩充和自适应融合对实点云场景进行语义分割)<br>
[paper](https://arxiv.org/abs/2103.07074)<br><br>

[11] How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines(线云如何保护隐私？ 从3D线中恢复场景详细信息)<br>
[paper](https://arxiv.org/pdf/2103.05086.pdf) | [code](https://github.com/kunalchelani/Line2Point)<br><br>

[10] PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency(使用深度空间一致性进行稳健的点云配准)<br>
[paper](https://arxiv.org/abs/2103.05465) | [code](https://github.com/XuyangBai/PointDSC)<br><br>

[9] Robust Point Cloud Registration Framework Based on Deep Graph Matching(基于深度图匹配的鲁棒点云配准框架)<br>
[paper](https://arxiv.org/pdf/2103.04256.pdf) | [code](https://github.com/fukexue/RGM)<br><br>

[8] TPCN: Temporal Point Cloud Networks for Motion Forecasting(面向运动预测的时态点云网络)
[paper](https://arxiv.org/abs/2103.03067) | [code]()

[7] PointGuard: Provably Robust 3D Point Cloud Classification(可证明稳健的三维点云分类)<br>
[paper](https://arxiv.org/abs/2103.03046)<br><br>

[6] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)<br>
[paper](https://arxiv.org/abs/2009.03137) | [code](https://github.com/QingyongHu/SensatUrban)<br><br>

[5] SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration(SpinNet：学习用于3D点云配准的通用表面描述符)<br>
[paper](https://arxiv.org/abs/2011.12149) | [code](https://github.com/QingyongHu/SpinNet)<br><br>

[4] MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization(通过3D扫描同步进行多主体分割和运动估计)<br>
[paper](https://arxiv.org/pdf/2101.06605.pdf) | [code](https://github.com/huangjh-pub/multibody-sync)<br><br>

[3] Diffusion Probabilistic Models for 3D Point Cloud Generation(三维点云生成的扩散概率模型)<br>
[paper](https://arxiv.org/abs/2103.01458) | [code](https://github.com/luost26/diffusion-point-cloud)<br><br>

[2] Style-based Point Generator with Adversarial Rendering for Point Cloud Completion(用于点云补全的对抗性渲染基于样式的点生成器)<br>
[paper](https://arxiv.org/abs/2103.02535)

[1] PREDATOR: Registration of 3D Point Clouds with Low Overlap(预测器：低重叠的3D点云的配准)<br>
[paper](https://arxiv.org/pdf/2011.13005.pdf) | [code](https://github.com/ShengyuH/OverlapPredator) | [project](https://overlappredator.github.io/)<br><br>


<a name="3DReconstruction"/> 

### 三维重建(3D Reconstruction)

[25] 3D-NVS: A 3D Supervision Approach for Next View Selection(3D-NVS：下一个视图选择的 3D 监督方法)<br>
[paper](https://arxiv.org/abs/2012.01743)<br><br>

[24] Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance(未知通用反射率的无纹理光滑表面的多视图 3D 重建)<br>
[paper](https://arxiv.org/abs/2105.11599)<br><br>

[23] Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches(Sketch2Model：从单个徒手草图开始的具有视图感知能力的3D建模)<br>
[paper](https://arxiv.org/abs/2105.06663)<br><br>

[22] From Points to Multi-Object 3D Reconstruction(从点到多对象3D重建)<br>
[paper](https://arxiv.org/abs/2012.11575)<br><br>

[21] End-to-End Human Pose and Mesh Reconstruction with Transformers(使用Transformer进行端到端的人体姿势和网格重建)<br>
[paper](https://arxiv.org/abs/2012.09760)<br><br>

[20] DECOR-GAN: 3D Shape Detailization by Conditional Refinement(DECOR-GAN：通过条件细化实现3D形状细化)<br>
[paper](https://arxiv.org/abs/2012.09159) | [code](https://github.com/czq142857/DECOR-GAN)<br><br>

[19] pixelNeRF: Neural Radiance Fields from One or Few Images(pixelNeRF：一幅或几幅图像的神经辐射场)<br>
[paper](https://arxiv.org/abs/2012.02190) | [project](https://alexyu.net/pixelnerf)<br><br>

[18] LASR: Learning Articulated Shape Reconstruction from a Monocular Video(LASR：从单眼视频中学习关节形状的重建)<br>
[paper](https://arxiv.org/abs/2105.02976) | [code](http://lasr-google.github.io/)<br><br>

[17] Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images(重访长方体：学习适合单个RGB图像的稳健3D形状)<br>
[paper](https://arxiv.org/abs/2105.02047)<br><br>

[16] Multi-person Implicit Reconstruction from a Single Image(从单个图像进行多人隐式重建)<br>
[paper](https://arxiv.org/abs/2104.09283)<br><br>

[15] CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo(CodedStereo：为大景深立体声而设计的相位掩模)<br>
[paper](https://arxiv.org/abs/2104.04641)<br><br>

[14] StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision(通过立体视觉进行深度感知的布景人类数字化)<br>
[paper](https://arxiv.org/abs/2104.05289) | [projec](http://crishy1995.github.io/StereoPIFuProject)<br><br>

[13] Global Transport for Fluid Reconstruction with Learned Self-Supervision(具有自学指导的流体重建的全球运输)<br>
[paper](https://arxiv.org/abs/2104.06031) | [code](https://github.com/tum-pbs/Global-Flow-Transport)<br><br>

[12] Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction(全面了解通用对象：建模，分段和重构)<br>
[paper](https://arxiv.org/abs/2104.00858)<br><br>

[11] Reconstructing 3D Human Pose by Watching Humans in the Mirror(通过照镜子中的人来重建3D人的姿势)<br>
[paper](https://arxiv.org/abs/2104.00340) | [project](https://zju3dv.github.io/Mirrored-Human/)<br><br>

[10] Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors(通过学习局部和全局形状先验的层次结构，促进单视图3D重构中的泛化)<br>
[paper](https://arxiv.org/abs/2104.00476)<br><br>

[9] NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video(单目视频的实时相干3D重建)<br>
[paper](https://arxiv.org/abs/2104.00681) | [project](https://zju3dv.github.io/neuralrecon/)<br><br>

[8] Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction(从时空描述符中学习并行密集对应，以进行有效且鲁棒的4D重建)<br>
[paper](https://arxiv.org/abs/2103.16341) | [code](https://github.com/tangjiapeng/LPDC-Net)<br><br>

[7] POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture(用于单视图人体体积捕获的姿势引导选择性融合)<br>
[paper](https://arxiv.org/abs/2103.15331) | [project](http://www.liuyebin.com/posefusion/posefusion.html)<br><br>

[6] Deep Implicit Moving Least-Squares Functions for 3D Reconstruction(用于3D重构的深层隐式移动最小二乘函数)<br>
[paper](https://arxiv.org/abs/2103.12266) | [code](https://github.com/Andy97/DeepMLS)<br><br>

[5] Model-based 3D Hand Reconstruction via Self-Supervised Learning(通过自我监督学习进行基于模型的3D手重建)<br>
[paper](https://arxiv.org/pdf/2103.11703)<br><br>

[4] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)<br>
[paper](https://arxiv.org/pdf/2103.08204.pdf) | [project](https://qiuyuda.github.io/3DCaricShop/)<br><br>

[3] Learning Compositional Representation for 4D Captures with Neural ODE(使用神经ODE学习4D捕捉的合成表示)<br>
[paper](https://arxiv.org/pdf/2103.08271.pdf)<br><br>

[2] SMPLicit: Topology-aware Generative Model for Clothed People(穿衣服的人的拓扑感知生成模型)<br>
[paper](https://arxiv.org/pdf/2103.06871.pdf) | [code](http://www.iri.upc.edu/people/ecorona/smplicit/)<br><br>

[1] PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers（具有透视作物层的3D姿势的几何感知神经重建）<br>
[paper](https://arxiv.org/abs/2011.13607)<br><br>

<br>

<a name="ModelCompression"/> 

## 模型压缩(Model Compression)

[6] Towards Compact CNNs via Collaborative Compression(通过协同压缩迈向紧凑型CNN)<br>
[paper](https://arxiv.org/abs/2105.11228)<br><br>

[5] Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation(联合DetNAS：用NAS、剪枝和动态蒸馏升级你的探测器)<br>
[paper](https://arxiv.org/abs/2105.12971)<br><br>

[4] Skip-Convolutions for Efficient Video Processing(跳过卷积以实现高效的视频处理)<br>
[paper](https://arxiv.org/abs/2104.11487)<br><br>

[3] Content-Aware GAN Compression(内容感知GAN压缩)<br>
[paper](https://arxiv.org/abs/2104.02244)<br><br>

[2] Dynamic Slimmable Network(动态可压缩网络)<br>
[paper](https://arxiv.org/abs/2103.13258) | [code](https://github.com/changlin31/DS-Net)<br><br>

[1] Learning Student Networks in the Wild（一种不需要原始训练数据的模型压缩和加速技术）<br>
[paper](https://arxiv.org/pdf/1904.01186.pdf) | [code](https://github.com/huawei-noah/DAFL)<br>
解读：[华为诺亚方舟实验室提出无需数据网络压缩技术](https://zhuanlan.zhihu.com/p/81277796)<br><br>

<a name="KnowledgeDistillation"/> 

### 知识蒸馏(Knowledge Distillation)

[13] Positive-Unlabeled Data Purification in the Wild for Object Detection(野外检测对象的阳性无标签数据提纯)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.pdf)<br><br>

[12] Wasserstein Contrastive Representation Distillation(Wasserstein对比表示蒸馏)<br>
[paper](https://arxiv.org/abs/2012.08674)<br><br>

[11] Distilling Knowledge via Knowledge Review(通过知识回顾提炼知识)<br>
[paper](https://arxiv.org/abs/2104.09044) | [code](https://github.com/Jia-Research-Lab/ReviewKD)<br><br>

[10] 3D-to-2D Distillation for Indoor Scene Parsing(用于室内场景解析的3D到2D蒸馏)<br>
[paper](https://arxiv.org/abs/2104.02243)<br><br>

[9] Complementary Relation Contrastive Distillation(互补关系对比蒸馏)<br>
[paper](https://arxiv.org/abs/2103.16367)<br><br>

[8] Distilling Object Detectors via Decoupled Features(通过解耦功能蒸馏物体检测器)<br>
[paper](https://arxiv.org/abs/2103.14475) | [code](https://github.com/ggjy/DeFeat.pytorch)<br><br>

[7] Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation(通过自学来完善自己：通过自我蒸馏提炼特征)<br>
[paper](https://arxiv.org/pdf/2103.08273.pdf) | [code](https://github.com/MingiJi/FRSKD)<br><br>

[6] Knowledge Evolution in Neural Networks(神经网络中的知识进化)<br>
[paper](https://arxiv.org/pdf/2103.05152.pdf) | [code](https://github.com/ahmdtaha/knowledge_evolution)<br><br>

[5] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning(少班级增量学习的语义感知知识蒸馏)<br>
[paper](https://arxiv.org/abs/2103.04059)<br><br>

[4] Teachers Do More Than Teach: Compressing Image-to-Image Models(https://arxiv.org/abs/2103.03467)<br>
[paper](https://arxiv.org/abs/2103.03467) | [code](https://github.com/snap-research/CAT)<br><br>

[3] General Instance Distillation for Object Detection(通用实例蒸馏技术在目标检测中的应用)<br>
[paper](https://arxiv.org/abs/2103.02340)<br><br>

[2] Multiresolution Knowledge Distillation for Anomaly Detection(用于异常检测的多分辨率知识蒸馏)<br>
[paper](https://arxiv.org/abs/2011.11108)<br><br>

[1] Distilling Object Detectors via Decoupled Features（前景背景分离的蒸馏技术） <br><br>


<a name="Pruning"/> 

### 剪枝(Pruning)

[3] Convolutional Neural Network Pruning with Structural Redundancy Reduction(减少结构冗余的卷积神经网络修剪)<br>
[paper](https://arxiv.org/abs/2104.03438)<br><br>

[2] Neural Response Interpretation through the Lens of Critical Pathways(关键途径对神经反应的解释)<br>
[paper](https://arxiv.org/abs/2103.16886) | [code1](https://github.com/CAMP-eXplain-AI/PathwayGrad) | [code2](https://github.com/CAMP-eXplain-AI/RoarTorch)<br><br>

[1] Manifold Regularized Dynamic Network Pruning(流形规则化动态网络剪枝)<br>
[paper](https://arxiv.org/pdf/2103.05861.pdf)<br><br>

<a name="Quantization"/> 

### 量化(Quantization)

[3] Network Quantization with Element-wise Gradient Scaling(逐元素梯度缩放的网络量化)<br>
[paper](https://arxiv.org/abs/2104.00903)<br><br>

[2] Zero-shot Adversarial Quantization(零样本对抗量化)<br>
[paper](https://arxiv.org/abs/2103.15263) | [code](https://git.io/Jqc0y)<br><br>

[1] Learnable Companding Quantization for Accurate Low-bit Neural Networks(精确低位神经网络的可学习压扩量化)<br>
[paper](https://arxiv.org/pdf/2103.07156.pdf)<br><br>

<br>

<a name="NNS"/> 

## 神经网络结构设计(Neural Network Structure Design)

[15] Improving Accuracy of Binary Neural Networks using Unbalanced Activation Distribution(使用不平衡激活分布提高二元神经网络的准确性)<br>
[paper](https://arxiv.org/abs/2012.00938)<br><br>

[14] Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation(用于自适应、高效和可控计算的异构网格卷积)<br>
[paper](https://arxiv.org/abs/2104.11176)<br><br>

[13] AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks(AsymmNet：利用不对称瓶颈迈向超轻型卷积神经网络)<br>
[paper](https://arxiv.org/abs/2104.07770) | [code](https://github.com/Spark001/AsymmNet)<br><br>

[12] CondenseNet V2: Sparse Feature Reactivation for Deep Networks(CondenseNet V2：深度网络的稀疏功能重新激活)<br>
[paper](https://arxiv.org/abs/2104.04382)<br><br>

[11] Convolutional Hough Matching Networks(卷积霍夫匹配网络)<br>
[paper](https://arxiv.org/abs/2103.16831)<br><br>

[10] Capsule Network is Not More Robust than Convolutional Network(胶囊网络并不比卷积网络更健壮)<br>
[paper](https://arxiv.org/abs/2103.15459)<br><br>

[9] Diverse Branch Block: Building a Convolution as an Inception-like Unit(多元分支块：将卷积构建为类似初始的单位)<br>
[paper](https://arxiv.org/abs/2103.13425) | [code](https://github.com/DingXiaoH/DiverseBranchBlock)<br><br>

[8] Scaling Local Self-Attention For Parameter Efficient Visual Backbones(扩展局部自注意力以获得有效的参数视觉主干)<br>
[paper](https://arxiv.org/pdf/2103.12731.pdf)<br><br>

[7] Fast and Accurate Model Scaling(快速准确的模型缩放)<br>
[paper](https://arxiv.org/abs/2103.06877)<br><br>

[6] Involution: Inverting the Inherence of Convolution for Visual Recognition(反转卷积的固有性以进行视觉识别)<br>
[paper](https://arxiv.org/abs/2103.06255) | [code](https://github.com/d-li14/involution)<br><br>

[5] Inception Convolution with Efficient Dilation Search(具有有效膨胀搜索的初始卷积)<br>
[paper](https://arxiv.org/pdf/2012.13587.pdf) | [code](https://github.com/yifan123/IC-Conv) | [解读-Inception convolution](https://zhuanlan.zhihu.com/p/354194188)
<br><br>

[4] Coordinate Attention for Efficient Mobile Network Design(协调注意力以实现高效的移动网络设计)<br>
[paper](https://arxiv.org/abs/2103.02907)<br><br>

[3] Rethinking Channel Dimensions for Efficient Model Design(重新考虑通道尺寸以进行有效的模型设计)<br>
[paper](https://arxiv.org/abs/2007.00992) | [code](https://github.com/clovaai/rexnet)<br><br>

[2] Inverting the Inherence of Convolution for Visual Recognition（颠倒卷积的固有性以进行视觉识别）<br><br>

[1] RepVGG: Making VGG-style ConvNets Great Again<br>
[paper](https://arxiv.org/abs/2101.03697) | [code](https://github.com/megvii-model/RepVGG)<br>
解读：[RepVGG：极简架构，SOTA性能，让VGG式模型再次伟大](https://zhuanlan.zhihu.com/p/344324470)<br><br>

<a name="Transformer"/> 

### Transformer

[2] Transformer Interpretability Beyond Attention Visualization(注意力可视化之外的Transformer可解释性)<br>
[paper](https://arxiv.org/pdf/2012.09838.pdf) | [code](https://github.com/hila-chefer/Transformer-Explainability)<br><br>

[1] Pre-Trained Image Processing Transformer(底层视觉预训练模型)<br>
[paper](https://arxiv.org/pdf/2012.00364.pdf) | [解读-Transformer再下一城！low-level多个任务榜首被占领，北大华为等联合提出预训练模型IPT](https://mp.weixin.qq.com/s?__biz=MzI5MDUyMDIxNA==&mid=2247524529&idx=1&sn=e39e67981b2afd9a5369cc843ddf28fe&chksm=ec1c8d48db6b045e7cf2d37c5633da8d3caf5e53178fe6df0913f2a04bc143fcc2e504d6f9be&token=2008688100&lang=zh_CN#rd)<br><br>

<a name="GNN"/> 

### 图神经网络(GNN)

[3] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts(窥探神经网络的推理：解读结构视觉概念)<br>
[paper](https://arxiv.org/abs/2105.00290)<br><br>

[2] Quantifying Explainers of Graph Neural Networks in Computational Pathology(计算病理学中图神经网络的量化解释器)<br>
[paper](https://arxiv.org/pdf/2011.12646.pdf)<br><br>

[1] Sequential Graph Convolutional Network for Active Learning(主动学习的顺序图卷积网络)<br>
[paper](https://arxiv.org/pdf/2006.10219.pdf)<br><br>

<a name="NAS"/> 

### 神经网络架构搜索(NAS)

[16] FP-NAS: Fast Probabilistic Neural Architecture Search(FP-NAS：快速概率神经架构搜索)<br>
[paper](https://arxiv.org/abs/2011.10949)<br><br>

[15] ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search(ViPNAS：通过神经架构搜索进行高效的视频姿态估计)<br>
[paper](https://arxiv.org/abs/2105.10154)<br><br>

[14] BCNet: Searching for Network Width with Bilaterally Coupled Network(BCNet：用双边耦合网络搜索网络宽度)<br>
[paper](https://arxiv.org/abs/2105.10533)<br><br>

[13] TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search(TransNAS-Bench-101：提高跨任务神经架构搜索的可转移性和通用性)<br>
[paper](https://arxiv.org/abs/2105.11871)<br><br>

[12] Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation(联合DetNAS：用NAS、剪枝和动态蒸馏升级你的探测器)<br>
[paper](https://arxiv.org/abs/2105.12971)<br><br>

[11] Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search(具有里程碑意义的正则化：神经体系结构搜索中的排名指导超级网络培训)<br>
[paper](https://arxiv.org/abs/2104.05309)<br><br>

[10] NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization(具有快速超级网络培训和架构优化的高效神经架构搜索)<br>
[paper](https://arxiv.org/abs/2104.00031) | [project](http://netadapt.mit.edu/)<br><br>

[9] One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking(通过分流引导的搜索空间缩小实现一站式神经集成结构搜索)<br>
[paper](https://arxiv.org/abs/2104.00597) | [code](https://github.com/researchmm/NEAS)<br><br>

[8] Dynamic Slimmable Network(动态可压缩网络)<br>
[paper](https://arxiv.org/abs/2103.13258) | [code](https://github.com/changlin31/DS-Net)<br><br>

[7] Prioritized Architecture Sampling with Monto-Carlo Tree Search(蒙特卡洛树搜索的优先架构采样)<br>
[paper](https://arxiv.org/pdf/2103.11922.pdf) | [code](https://github.com/xiusu/NAS-Bench-Macro)<br><br>

[6] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator(通过生成进行搜索：带有架构生成器的灵活高效的一键式NAS)<br>
[paper](https://arxiv.org/abs/2103.07289) | [code](https://github.com/eric8607242/SGNAS)<br><br>

[5] Contrastive Neural Architecture Search with Neural Architecture Comparators(带有神经结构比较器的对比神经网络架构搜索)<br>
[paper](https://arxiv.org/abs/2103.05471) | [code](https://github.com/chenyaofo/CTNAS)<br><br>

[4] OPANAS: One-Shot Path Aggregation Network Architecture Search for Object(一键式路径聚合网络体系结构搜索对象)<br>
[paper](https://arxiv.org/abs/2103.04507) | [code](https://github.com/VDIGPKU/OPANAS)<br><br>

[3] AttentiveNAS: Improving Neural Architecture Search via Attentive(通过注意力改善神经架构搜索) <br>
[paper](https://arxiv.org/pdf/2011.09011.pdf)<br><br>

[2] ReNAS: Relativistic Evaluation of Neural Architecture Search(NAS predictor当中ranking loss的重要性)<br>
[paper](https://arxiv.org/pdf/1910.01523.pdf)<br><br>

[1] HourNAS: Extremely Fast Neural Architecture（降低NAS的成本）<br>
[paper](https://arxiv.org/pdf/2005.14446.pdf)<br><br>


<br>

<a name="DataProcessing"/> 

## 数据处理(Data Processing)

<a name="DataAugmentation"/> 

### 数据增广(Data Augmentation)

[3] A Fourier-based Framework for Domain Generalization(基于傅立叶的域泛化框架)<br>
[paper](https://arxiv.org/abs/2105.11120)<br><br>

[2] AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation(通过可扩展的概率隐式微分对带有标签噪声的有偏数据进行鲁棒的自动增强)<br>
[paper](https://arxiv.org/abs/2103.05863)<br><br>

[1] KeepAugment: A Simple Information-Preserving Data Augmentation(一种简单的保存信息的数据扩充)<br>
[paper](https://arxiv.org/pdf/2011.11778.pdf)<br><br>

<a name="RepresentationLearning"/> 

### 表征学习(Representation Learning)

[19] Task Programming: Learning Data Efficient Behavior Representations(任务编程：学习数据高效行为表征)<br>
[paper](https://arxiv.org/abs/2011.13917) ｜ [code](https://github.com/neuroethology/TREBA) | [project](https://sites.google.com/view/task-programming)<br><br>

[18] Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization(通过对比交叉视图互信息最大化学习视图解开人体姿势表示)<br>
[paper](https://arxiv.org/abs/2012.01405) | [code](https://github.com/google-research/google-research/tree/master/poem)<br><br>

[17] SSAN: Separable Self-Attention Network for Video Representation Learning(SSAN：用于视频表示学习的可分离自注意力网络)<br>
[paper](https://arxiv.org/abs/2105.13033)<br><br>

[16] Generative Interventions for Causal Learning(因果学习的生成性干预)<br>
[paper](https://arxiv.org/abs/2012.12265)<br><br>

[15] Representation Learning via Global Temporal Alignment and Cycle-Consistency(通过全局时间对齐和周期一致性进行表示学习)<br>
[paper](https://arxiv.org/abs/2105.05217)<br><br>

[14] Multi-Perspective LSTM for Joint Visual Representation Learning(用于联合视觉表示学习的多视角LSTM)<br>
[paper](https://arxiv.org/abs/2105.02802) | [code](https://github.com/arsm/MPLSTM)<br><br>

[13] Unsupervised Visual Representation Learning by Tracking Patches in Video(通过跟踪视频中的补丁来进行无监督的视觉表示学习)<br>
[paper](https://arxiv.org/abs/2105.02545) | [code](http://github.com/microsoft/CtP)<br><br>

[12] A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning(无监督时空表示学习的大规模研究)<br>
[paper](https://arxiv.org/abs/2104.14558)<br><br>

[11] Where and What? Examining Interpretable Disentangled Representations(在哪里和什么？ 检查可解释的纠缠表示)<br>
[paper](https://arxiv.org/abs/2104.05622)<br><br>

[10] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning(眼见为实：视觉语言表示学习的端到端预训练)<br>
[paper](https://arxiv.org/abs/2104.03135)<br><br>

[9] Self-supervised Video Representation Learning by Context and Motion Decoupling(通过上下文和运动解耦进行自我监督的视频表示学习)<br>
[paper](https://arxiv.org/abs/2104.00862)<br><br>

[8] Jigsaw Clustering for Unsupervised Visual Representation Learning(拼图聚类的无监督视觉表示学习)<br>
[paper](https://arxiv.org/abs/2104.00323) | [code](https://github.com/Jia-Research-Lab/JigsawClustering)<br><br>

[7] Learning by Aligning Videos in Time(【视频表征】通过时间对齐视频进行学习)<br>
[paper](https://arxiv.org/abs/2103.17260)<br><br>

[6] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting(矢量化和光栅化：素描和手写的自我指导学习)<br>
[paper](https://arxiv.org/abs/2103.13716) | [code](https://github.com/AyanKumarBhunia/Self-Supervised-Learning-for-Sketch)<br><br>

[5] Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks(神经零件：使用可逆神经网络学习富有表现力的3D形状提取)<br>
[paper](https://arxiv.org/pdf/2103.10429.pdf)<br><br>

[4] VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples(对比视频表示学习和临时对抗示例)<br>
[paper](https://arxiv.org/abs/2103.05905)<br><br>

[3] Spatially Consistent Representation Learning(空间一致表示学习)<br>
[paper](https://arxiv.org/abs/2103.06122)<br><br>

[2] Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning(通过添加背景来删除背景：朝着背景进行鲁棒的自我监督视频表示学习)<br>
[paper](https://arxiv.org/pdf/2009.05769.pdf) | [code](https://github.com/FingerRec/BE) | [project](https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html) | [解读](https://zhuanlan.zhihu.com/p/355232006)<br><br>

[1] VirTex: Learning Visual Representations from Textual Annotations（从文本注释中学习视觉表示）<br>
[paper](https://arxiv.org/abs/2006.06666) | [code](https://github.com/kdexd/virtex)<br><br>

<a name="BatchNormalization"/> 

### 归一化/正则化(Batch Normalization)

[3] Adaptive Consistency Regularization for Semi-Supervised Transfer Learning(半监督转移学习的自适应一致性正则化)<br>
[paper](https://arxiv.org/abs/2103.02193) | [code](https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning)

[2] Meta Batch-Instance Normalization for Generalizable Person Re-Identification(通用批处理人员重新标识的元批实例规范化)<br>
[paper](https://arxiv.org/abs/2011.14670)<br><br>

[1] Representative Batch Normalization with Feature Calibration（具有特征校准功能的代表性批量归一化）<br><br>

<a name="ImageClustering"/> 

### 图像聚类(Image Clustering)

[4] Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes(具有10^7个节点的大规模图上的结构感知人脸聚类)<br>
[paper](https://arxiv.org/abs/2103.13225) | [code&project](https://sstzal.github.io/STAR-FC/)<br><br>

[3] COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction(通过对比预测的不完整多视图聚类)<br>
[paper](http://pengxi.me/wp-content/uploads/2021/03/2021CVPR-completer.pdf) | [code](https://github.com/XLearning-SCU/2021-CVPR-Completer)<br><br>

[2] Improving Unsupervised Image Clustering With Robust Learning（通过鲁棒学习改善无监督图像聚类）<br>
[paper](https://arxiv.org/abs/2012.11150) | [code](https://github.com/deu30303/RUC)<br><br>

[1] Reconsidering Representation Alignment for Multi-view Clustering(重新考虑多视图聚类的表示对齐方式)<br>
[paper](https://arxiv.org/abs/2103.07738) | [code](https://github.com/DanielTrosten/mvc)<br><br>


<a name="ImageCompression"/> 

### 图像压缩(Image Compression)

[4] Learning Scalable ℓ∞-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression(通过联合有损图像和残差压缩学习可伸缩ℓ∞约束的近无损图像压缩)<br>
[paper](https://arxiv.org/abs/2103.17015) | [code](https://github.com/BYchao100/Scalable-Near-lossless-Image-Compression)<br><br>

[3] Checkerboard Context Model for Efficient Learned Image Compression(高效学习图像压缩的棋盘上下文模型)<br>
[paper](https://arxiv.org/abs/2103.15306)<br><br>

[2] Slimmable Compressive Autoencoders for Practical Neural Image Compression(实用神经图像压缩的可压缩压缩自动编码器)<br>
[paper](https://arxiv.org/abs/2103.15726)<br><br>

[1] Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton(通过压缩感知显着性骨架的深度重构来进行注意力引导的图像压缩)<br>
[paper](https://arxiv.org/abs/2103.15368)<br><br>

<a name="AnomalyDetection"/> 

### 异常检测(Anomaly Detection)

[3] MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space(MOS：面向大型语义空间的规模化异常样本检测)<br>
[paper](https://arxiv.org/abs/2105.01879)<br><br>

[2] MOOD: Multi-level Out-of-distribution Detection(MOOD：多级异常样本检测)<br>
[paper](https://arxiv.org/abs/2104.14726)<br><br>

[1] Learning Placeholders for Open-Set Recognition(学习占位符以进行开放式识别)<br>
[paper](https://arxiv.org/abs/2103.15086)<br><br>


<br>

<a name="ModelTraining"/> 

## 模型训练/泛化(Model Training/Generalization)

[8] A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks(【优化算法】Bop和超越：二值神经网络的二阶优化器)<br>
[paper](https://arxiv.org/abs/2104.05124)<br><br>

[7] Simpler Certified Radius Maximization by Propagating Covariances(通过传播协方差简化认证半径最大化)<br>
[paper](https://arxiv.org/abs/2104.05888) | [video](https://youtu.be/m1ya2oNf5iE)<br><br>

[6] Differentiable Patch Selection for Image Recognition(用于图像识别的差异化补丁选择)<br>
[paper](https://arxiv.org/abs/2104.03059) | [code](https://github.com/google-research/google-research/tree/master/ptopk_patch_selection/)<br><br>

[5] Towards Evaluating and Training Verifiably Robust Neural Networks(评估和训练可验证的稳健神经网络)<br>
[paper](https://arxiv.org/abs/2104.00447) | [code](https://github.com/ZhaoyangLyu/VerifiablyRobustNN)<br><br>

[4] Student-Teacher Learning from Clean Inputs to Noisy Inputs(从纯净输入到噪音输入的师生学习)<br>
[paper](https://arxiv.org/pdf/2103.07600.pdf)<br><br>

[3] Uncertainty-guided Model Generalization to Unseen Domains(不确定性指导的模型泛化)<br>
[paper](https://arxiv.org/abs/2103.07531)<br><br>

[2] Knowledge Evolution in Neural Networks(神经网络中的知识进化)<br>
[paper](https://arxiv.org/pdf/2103.05152.pdf) | [code](https://github.com/ahmdtaha/knowledge_evolution)<br><br>

[1] PGT: A Progressive Method for Training Models on Long Videos(一种在长视频上训练模型的渐进方法)<br>
[paper](https://arxiv.org/pdf/2103.11313.pdf) | [code](https://github.com/BoPang1996/PGT)<br><br>

<a name="NoisyLabel"/> 

### 噪声标签(Noisy Label)

[3] Correlated Input-Dependent Label Noise in Large-Scale Image Classification(大规模图像分类中的关联输入相关标签噪声)<br>
[paper](https://arxiv.org/abs/2105.10305)<br><br>

[2] A Second-Order Approach to Learning with Instance-Dependent Label Noise(与实例相关的标签噪声的二阶学习方法)<br>
[paper](https://arxiv.org/abs/2012.11854) | [code](https://github.com/UCSC-REAL/CAL)<br><br>

[1] Partially View-aligned Representation Learning with Noise-robust Contrastive Loss(面向部分视图对齐表示学习的噪声鲁棒对比损失函数)<br>
[paper](http://pengxi.me/wp-content/uploads/2021/03/2021CVPR-MvCLNwith-supp.pdf) | [code](https://github.com/XLearning-SCU/2021-CVPR-MvCLN)<br><br>

<a name="Long-Tailed"/> 

### 长尾分布(Long-Tailed Distribution)

[8] Disentangling Label Distribution for Long-tailed Visual Recognition(用于长尾视觉识别的解开标签分布)<br>
[paper](https://arxiv.org/abs/2012.00321)<br><br>

[7] Adversarial Robustness under Long-Tailed Distribution(长尾分布下的对抗鲁棒性)<br>
[paper](https://arxiv.org/abs/2104.02703) | [code](https://github.com/wutong16/Adversarial_Long-Tail)<br><br>

[6] Adaptive Class Suppression Loss for Long-Tail Object Detection(长尾目标检测的自适应类抑制损失)<br>
[paper](https://arxiv.org/abs/2104.00885) | [code](https://github.com/CASIA-IVA-Lab/ACSL)<br><br>

[5] Improving Calibration for Long-Tailed Recognition(改善长尾识别的校准)<br>
[paper](https://arxiv.org/abs/2104.00466) | [code](https://github.com/Jia-Research-Lab/MiSLAS)<br><br>

[4] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification(基于对比学习的混合网络的长尾图像分类)<br>
[paper](https://arxiv.org/abs/2103.14267)<br><br>

[3] PML: Progressive Margin Loss for Long-tailed Age Classification(长尾年龄分类的累进边际损失)<br>
[paper](https://arxiv.org/abs/2103.02140)<br><br>

[2] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition(MetaSAug：用于长尾视觉识别的元语义增强)<br>
[paper](https://arxiv.org/pdf/2103.12579.pdf)<br><br>

[1] Distribution Alignment: A Unified Framework for Long-tail Visual Recognition(分布对齐：长尾视觉识别的统一框架)<br>
[paper](https://arxiv.org/abs/2103.16370) | [code](https://github.com/Megvii-BaseDetection/DisAlign)<br><br>
<br>

<a name="ModelEvaluation"/> 

## 模型评估(Model Evaluation)

[1] Are Labels Necessary for Classifier Accuracy Evaluation?(测试集没有标签，我们可以拿来测试模型吗？)<br>
[paper](https://arxiv.org/abs/2007.02915) | [解读](https://zhuanlan.zhihu.com/p/328686799)<br><br>

<br>

<a name="MMLearning"/> 

## 多模态学习(Multi-Modal Learning)

[8] Distilling Audio-Visual Knowledge by Compositional Contrastive Learning(运用组合对比学习提取视听知识)<br>
[paper](https://arxiv.org/abs/2104.10955) | [code](https://github.com/yanbeic/CCL)<br><br>

[7] Cross-Modal Center Loss for 3D Cross-Modal Retrieval(用于3D跨模态检索的跨模态中心损失)<br>
[paper](https://arxiv.org/abs/2008.03561) | [code](https://github.com/LongLong-Jing/Cross-Modal-Center-Loss)<br><br>

[6] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)<br>
[paper](https://arxiv.org/abs/2103.11832)<br><br>

[5] There is More than Meets the Eye: Self-Supervised Multi-Object Detection  and Tracking with Sound by Distilling Multimodal Knowledge(多模态知识提取的自监督多目标检测与有声跟踪)<br>
[paper](https://arxiv.org/abs/2103.01353) | [video](https://www.youtube.com/channel/UCRpM8k1GY3kD2TqCo_yKN3g) | [project](http://rl.uni-freiburg.de/research/multimodal-distill)<br><br>

[4] Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion(具有深度敏感注意力和自动多模态融合的深度RGB-D显著性检测)<br>
[paper](https://arxiv.org/abs/2103.11832)<br><br>

[3] LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents(动态代理的多模态未来轨迹的车道感知预测)<br>
[paper](https://arxiv.org/abs/2104.00249)<br><br>

[2] Multimodal Motion Prediction with Stacked Transformers(堆叠式Transformer的多模态运动预测)<br>
[paper](https://arxiv.org/pdf/2103.11624.pdf) | [code](https://decisionforce.github.io/mmTransformer)<br><br>

[1] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving(用于端到端自动驾驶的多模态融合Transformer)<br>
[paper](https://arxiv.org/abs/2104.09224)<br><br>

<a name="Audio-VisualLearning"/> 

### 视听学习(Audio-visual Learning)

[7] Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions(语音时刻：从视频描述中学习联合视听表示)<br>
[paper](https://arxiv.org/abs/2105.04489)<br><br>

[6] Visually Informed Binaural Audio Generation without Binaural Audios(无需双耳音频的可视化双耳音频生成)<br>
[paper](https://arxiv.org/abs/2104.06162) | [project](https://sheldontsui.github.io/projects/PseudoBinaural)<br><br>

[5] Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation(观察您的语音：学习跨模态亲和力以进行视听语音分离)<br>
[paper](https://arxiv.org/abs/2104.02775) | [project](https://caffnet.github.io/)<br><br>

[4] Localizing Visual Sounds the Hard Way(视觉声音定位的困难方法)<br>
[paper](https://arxiv.org/abs/2104.02691)<br><br>

[3] Can audio-visual integration strengthen robustness under multimodal attacks?(视听集成能否增强多模式攻击下的鲁棒性？)<br>
[paper](https://arxiv.org/abs/2104.02000)<br><br>

[2] Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation(探测对象视觉接地与声音分离的循环共同学习)<br>
[paper](https://arxiv.org/abs/2104.02026)<br><br>

[1] Positive Sample Propagation along the Audio-Visual Event Line(沿视听事件线的正样本传播)<br>
[paper](https://arxiv.org/abs/2104.00239) | [code](https://github.com/jasongief/PSP_CVPR_2021)<br><br>


<br>
<a name="Vision-basedPrediction"/> 

## 视觉预测(Vision-based Prediction)

[12] Shared Cross-Modal Trajectory Prediction for Autonomous Driving(自动驾驶的共享跨模态轨迹预测)<br>
[paper](https://arxiv.org/abs/2011.08436)<br><br>

[11] We are More than Our Joints: Predicting how 3D Bodies Move()<br>
[paper](https://arxiv.org/abs/2012.00619) ｜ [code](https://github.com/yz-cnsdqz/MOJO-release) | [project](https://yz-cnsdqz.github.io/MOJO/MOJO.html)<br><br>

[10] Interpretable Social Anchors for Human Trajectory Forecasting in Crowds(人群中人类轨迹预测的可解释社会锚点)<br>
[paper](https://arxiv.org/abs/2105.03136)<br><br>

[9] DriveGAN: Towards a Controllable High-Quality Neural Simulation(DriveGAN：迈向可控的高质量神经仿真)<br>
[paper](https://arxiv.org/abs/2104.15060)<br><br>

[8] Learning Semantic-Aware Dynamics for Video Prediction(视频预测中的语义感知动态学习)<br>
[paper](https://arxiv.org/abs/2104.09762)<br><br>

[7] Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction(车道感知不同轨迹预测的分而治之)<br>
[paper](https://arxiv.org/abs/2104.08277)<br><br>

[6] GATSBI: Generative Agent-centric Spatio-temporal Object Interaction(GATSBI：以生成代理为中心的时空对象交互)<br>
[paper](https://arxiv.org/abs/2104.04275)<br><br>

[5] SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction(SGCN：行人轨迹预测的稀疏图卷积网络)<br>
[paper](https://arxiv.org/abs/2104.01528)<br><br>

[4] LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents(动态代理的多模态未来轨迹的车道感知预测)<br>
[paper](https://arxiv.org/abs/2104.00249)<br><br>

[3] Multimodal Motion Prediction with Stacked Transformers(堆叠式Transformer的多模态运动预测)<br>
[paper](https://arxiv.org/pdf/2103.11624.pdf) | [code](https://decisionforce.github.io/mmTransformer)<br><br>

[2] Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning(通过记忆对准学习的视频预测调用长期运动环境)<br>
[paper](https://arxiv.org/abs/2104.00924)<br><br>

[1] MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions(针对复杂时空运动的通用视频预测模型)<br>
[paper](https://arxiv.org/abs/2103.02243) | [解读](https://zhuanlan.zhihu.com/p/355703957)<br><br>


<br>

<a name="Dataset"/> 

## 数据集(Dataset)

[18] VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval(VIGOR：超越一对一检索的交叉视图图像地理定位)<br>
[paper](https://arxiv.org/abs/2011.12172) | [dataset&project](https://github.com/Jeff-Zilence/VIGOR)<br><br>

[17] Multi-shot Temporal Event Localization: a Benchmark(多镜头时间事件本地化：基准)<br>
[paper](https://arxiv.org/abs/2012.09434) | [dataset&project](https://songbai.site/muses/)<br><br>

[16] Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark(检测，跟踪和计数遇到人群中的无人机：基准)<br>
[paper](https://arxiv.org/abs/2105.02440) | [dataset&code](https://github.com/VisDrone/DroneCrowd)<br><br>

[15] AGORA: Avatars in Geography Optimized for Regression Analysis(AGORA：针对回归分析进行了优化的地理头像)<br>
[paper](https://arxiv.org/abs/2104.14643) | [project](https://agora.is.tue.mpg.de)<br><br>

[14] Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets(【数据集标注】寻求有效注释大型图像分类数据集的良好做法)<br>
[paper](https://arxiv.org/abs/2104.12690) | [project](https://fidler-lab.github.io/efficient-annotation-cookbook)<br><br>

[13] Learning To Count Everything(【视觉计数】学习计算一切)<br>
[paper](https://arxiv.org/abs/2104.08391) | [dataset&code](https://github.com/cvlab-stonybrook/LearningToCountEverything)<br><br>

[12] DexYCB: A Benchmark for Capturing Hand Grasping of Objects(DexYCB：捕获对象的手抓握的基准)<br>
[paper](https://arxiv.org/abs/2104.04631) |[dataset&code](https://dex-ycb.github.io/)

[11] The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions(多智能体行为数据集：鼠标二元社交互动)<br>
[paper](https://arxiv.org/abs/2104.02710) | [dataset](https://www.aicrowd.com/challenges/multi-agent-behavior-representation-modeling-measurement-and-applications)<br><br>

[10] Deep Animation Video Interpolation in the Wild(野外深度动画视频插帧)<br>
[paper](https://arxiv.org/abs/2104.02495) | [dataset&code](https://github.com/lisiyao21/AnimeInterp/)<br><br>

[9] Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes(在动态场景中实现卷帘快门校正和去模糊)<br>
[paper](https://arxiv.org/abs/2104.01601) | [dataset&code](https://github.com/zzh-tech/RSCD)<br><br>

[8] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles(无人机-人类：了解无人机行为的大型基准)<br>
[paper](https://arxiv.org/abs/2104.00946)<br><br>

[7] Visual Semantic Role Labeling for Video Understanding(【视频理解】用于视频理解的视觉语义角色标签)<br>
[paper](https://arxiv.org/abs/2104.00990) | [dataset&code](http://vidsitu.org/)<br><br>

[6] Face Forensics in the Wild(人脸伪造数据集)<br>
[paper](https://arxiv.org/abs/2103.16076) | [dataset&code](https://github.com/tfzhou/FFIW)<br><br>

[5] Benchmarking Representation Learning for Natural World Image Collections(【自然图像分类】自然世界影像收藏的基准表示学习)<br>
[paper](https://arxiv.org/abs/2103.16483)<br><br>

[4] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark(多标签下水道缺陷分类数据集和基准)<br>
[paper](https://arxiv.org/abs/2103.10895) | [project&dataset](https://vap.aau.dk/sewer-ml/)<br><br>

[3] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(单视图3D漫画面部重建的数据集和基线方法)<br>
[paper](https://arxiv.org/pdf/2103.08204.pdf) | [project](https://qiuyuda.github.io/3DCaricShop/)<br><br>

[2] Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges(走向城市规模3D点云的语义分割：数据集，基准和挑战)<br>
[paper](https://arxiv.org/abs/2009.03137) | [code](https://github.com/QingyongHu/SensatUrban)<br><br>

[1] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels（重新标记ImageNet：从单标签到多标签，从全局标签到本地标签）<br>
[paper](https://arxiv.org/abs/2101.05022) | [code](https://github.com/naver-ai/relabel_imagenet)<br><br>

<br>

<a name="ActiveLearning"/> 

## 主动学习(Active Learning)


[3] Vab-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning<br>
[paper](https://arxiv.org/abs/2003.11249)<br><br>

[2] Multiple Instance Active Learning for Object Detection（用于对象检测的多实例主动学习）<br>
[paper](https://github.com/yuantn/MIAL/raw/master/paper.pdf) | [code](https://github.com/yuantn/MIAL)<br><br>

[1] Sequential Graph Convolutional Network for Active Learning(主动学习的顺序图卷积网络)<br>
[paper](https://arxiv.org/pdf/2006.10219.pdf)<br><br>

<br>

<a name="Few-shotLearning"/> 

## 小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

[11] Few-Shot Classification with Feature Map Reconstruction Networks(使用特征映射重建网络的少样本分类)<br>
[paper](https://arxiv.org/abs/2012.01506) | [code](https://github.com/Tsingularity/FRN)<br><br>

[10] Learning Graph Embeddings for Compositional Zero-shot Learning(组成零样本学习的学习图嵌入)<br>
[paper](https://arxiv.org/abs/2102.01987) | [code](https://github.com/ExplainableML/czsl)<br><br>

[9] Self-Guided and Cross-Guided Learning for Few-Shot Segmentation(自我指导和交叉指导学习，用于小样本分割)<br>
[paper](https://arxiv.org/abs/2103.16129)<br><br>

[8] Contrastive Embedding for Generalized Zero-Shot Learning(广义零样本学习的对比嵌入)<br>
[paper](https://arxiv.org/abs/2103.16173) | [code](https://github.com/Hanzy1996/CE-GZSL)<br><br>

[7] Learning Dynamic Alignment via Meta-filter for Few-shot Learning(通过元过滤器学习动态对齐，以进行小样本学习)<br>
[paper](https://arxiv.org/abs/2103.13582)<br><br>

[6] Goal-Oriented Gaze Estimation for Zero-Shot Learning(零样本学习的目标导向注视估计)<br>
[paper](https://arxiv.org/abs/2103.03433) | [code](https://github.com/osierboy/GEM-ZSL)<br><br>

[5] Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?<br>
[paper](https://arxiv.org/abs/2012.06166) | [code](https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation)<br><br>

[4] Counterfactual Zero-Shot and Open-Set Visual Recognition(反事实零样本和开集视觉识别)<br>
[paper](https://arxiv.org/abs/2103.00887) | [code](https://github.com/yue-zhongqi/gcm-cf)<br><br>

[3] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(小样本目标检测的语义关系推理)<br>
[paper](https://arxiv.org/abs/2103.01903)<br><br>

[2] Few-shot Open-set Recognition by Transformation Consistency(转换一致性的小样本开放集识别)<br><br>

[1] Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning(探索小样本学习的不变表示形式和等变表示形式的互补强度)<br>
[paper](https://arxiv.org/abs/2103.01315)<br><br>

<br>

<a name="ContinualLearning"/> 

## 持续学习(Continual Learning/Life-long Learning)

[5] Rectification-based Knowledge Retention for Continual Learning(基于矫正的知识保留用于持续学习)<br>
[paper](https://arxiv.org/abs/2103.16597)<br><br>

[4] Rainbow Memory: Continual Learning with a Memory of Diverse Samples(彩虹记忆：持续学习与多种样本的记忆)<br>
[paper](https://arxiv.org/abs/2103.17230) | [code](https://github.com/clovaai/rainbow-memory)<br><br>

[3] Efficient Feature Transformations for Discriminative and Generative Continual Learning(区分性和生成性持续学习的有效特征转换)<br>
[paper](https://arxiv.org/abs/2103.13558)<br><br>

[2] Rainbow Memory: Continual Learning with a Memory of Diverse Samples（不断学习与多样本的记忆）<br><br>

[1] Learning the Superpixel in a Non-iterative and Lifelong Manner(以非迭代和终身的方式学习超像素)<br>
[paper](https://arxiv.org/pdf/2103.10681.pdf)<br><br>

<br>

<a name="SG"/> 

## 场景图(Scene Graph)

<a name="SGG"/> 

### 场景图生成(Scene Graph Generation)

[4] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation(具有自适应消息传递功能的二分图网络，用于无偏场景图的生成)<br>
[paper](https://arxiv.org/abs/2104.00308)<br><br>

[3] Fully Convolutional Scene Graph Generation(全卷积场景图生成)<br>
[paper](https://arxiv.org/abs/2103.16083)<br><br>

[2] Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation(场景图生成的语义歧义概率建模)<br>
[paper](https://arxiv.org/abs/2103.05271)<br><br>

[1] Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis(利用基于边缘的推理进行基于3D点的场景图分析)<br>
[paper](https://arxiv.org/pdf/2103.05558.pdf)<br><br>

<a name="SGP"/> 

### 场景图预测(Scene Graph Prediction)

[1] SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences(基于RGB-D序列的增量3D场景图预测)<br>
[paper](https://arxiv.org/abs/2103.14898)<br><br>

<a name="SGU"/> 

### 场景图理解(Scene Graph Understanding)

[4] Semantic Scene Completion via Integrating Instances and Scene in-the-Loop(通过集成实例和场景在环来完成语义场景)<br>
[paper](https://arxiv.org/abs/2104.03640) | [code](https://github.com/yjcaimeow/SISNet)<br><br>

[3] 3D-to-2D Distillation for Indoor Scene Parsing(用于室内场景解析的3D到2D蒸馏)<br>
[paper](https://arxiv.org/abs/2104.02243)<br><br>

[2] Bidirectional Projection Network for Cross Dimension Scene Understanding(双向投影网络，用于跨维度场景理解)<br>
[paper](https://arxiv.org/abs/2103.14326) | [code](https://github.com/wbhu/BPNet)<br><br>

[1] Monte Carlo Scene Search for 3D Scene Understanding(蒙特卡洛场景搜索以了解3D场景)<br>
[paper](https://arxiv.org/pdf/2103.07969.pdf)<br><br>

<br><br>

<a name="VisualLocalization"/> 

## 视觉定位(Visual Localization)

[1] LoFTR: Detector-Free Local Feature Matching with Transformers(【图像特征匹配】LoFTR：与变压器互不影响的无检测器局部特征)<br>
[paper](https://arxiv.org/abs/2104.00680) | [project](https://zju3dv.github.io/loftr/)<br><br>


<br><br>

<a name="VisualReasoning"/> 

## 视觉推理/视觉问答(Visual Reasoning/VQA)

[9] Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules(找到了我的理由？ 使用胶囊进行弱监督的地面视觉问答)<br>
[paper](https://arxiv.org/abs/2105.04836)<br><br>

[8] Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering(通往答案的桥梁：用于视频问答的结构感知图交互网络)<br>
[paper](https://arxiv.org/abs/2104.14085)<br><br>

[7] PQA: Perceptual Question Answering(感性问题解答)<br>
[paper](https://arxiv.org/abs/2104.03589)<br><br>

[6] Domain-robust VQA with diverse datasets and methods but no target labels(具有各种数据集和方法，但没有目标标签的领域稳健的VQA)<br>
[paper](https://arxiv.org/abs/2103.15974)<br><br>

[5] AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning(AGQA：组成时空推理的基准)<br>
[paper](https://arxiv.org/abs/2103.16002)<br><br>

[4] Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution(通过概率绑架和执行进行抽象时空推理)
[paper](https://arxiv.org/abs/2103.14230) | [project](http://wellyzhang.github.io/project/prae.html) | [supplementary](http://wellyzhang.github.io/attach/cvpr21zhang_prae_supp.pdf)<br><br>

[3] ACRE: Abstract Causal REasoning Beyond Covariation(ACRE：超越协方差的抽象因果推理)<br>
[paper](https://arxiv.org/abs/2103.14232) | [project](http://wellyzhang.github.io/project/acre.html) | [Supplementary](http://wellyzhang.github.io/attach/cvpr21zhang_acre_supp.pdf)<br><br>

[2] TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events(问题解答基准和有效的交通事件视频推理网络)<br>
[paper](https://arxiv.org/abs/2103.15538) | [project](https://github.com/SUTDCV/SUTD-TrafficQA)<br>

[1] Transformation Driven Visual Reasoning(转型驱动的视觉推理)<br>
[paper](https://arxiv.org/pdf/2011.13160.pdf) | [code](https://github.com/hughplay/TVR) | [project](https://hongxin2019.github.io/TVR/)<br>


<br><br>

<a name="ImageClassification"/> 

## 图像分类(Image Classification)

[5] Benchmarking Representation Learning for Natural World Image Collections(自然世界影像收藏的基准表示学习)<br>
[paper](https://arxiv.org/abs/2103.16483)<br><br>

[4] Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark(多标签下水道缺陷分类数据集和基准)<br>
[paper](https://arxiv.org/abs/2103.10895) | [project&dataset](https://vap.aau.dk/sewer-ml/)<br><br>

[3] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification(基于对比学习的混合网络的长尾图像分类)<br>
[paper](https://arxiv.org/abs/2103.14267)<br><br>

[2] PML: Progressive Margin Loss for Long-tailed Age Classification(长尾年龄分类的累进边际损失)<br>
[paper](https://arxiv.org/abs/2103.02140)<br><br>

[1] A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification(细粒度分类的半监督学习的现实评估)<br>
[paper](https://arxiv.org/abs/2104.00679)<br><br>


<br><br>

<a name="domain"/> 

## 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[24] FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation(FixBi：无监督域适应的桥接域空间)<br>
[paper](https://arxiv.org/abs/2011.09230)<br><br>

[23] Ranking Neural Checkpoints(对神经检查点进行排名)<br>
[paper](https://arxiv.org/abs/2011.11200)<br><br>

[22] How Well Do Self-Supervised Models Transfer?(自监督模型的迁移效果如何？)<br>
[paper](https://arxiv.org/abs/2011.13377) | [code](https://github.com/linusericsson/ssl-transfer)<br><br>

[21] Adversarially Adaptive Normalization for Single Domain Generalization(单域泛化的对抗性自适应归一化)<br>
[paper](https://arxiv.org/abs/2106.01899)<br><br>

[20] Visualizing Adapted Knowledge in Domain Transfer(领域转移中适应性知识的可视化)<br>
[paper](https://arxiv.org/abs/2104.10602) | [code](https://github.com/hou-yz/DA_visualization)<br><br>

[19] Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation(基于实例级亲和力的无监督域自适应传输)<br>
[paper](https://arxiv.org/abs/2104.01286) | [code](https://github.com/astuti/ILA-DA)<br><br>

[18] Unsupervised Multi-source Domain Adaptation Without Access to Source Data(无需访问源数据的无监督多源域适配)<br>
[paper](https://arxiv.org/abs/2104.01845)<br><br>

[17] Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation(多目标领域适应的课程图协同教学)<br>
[paper](https://arxiv.org/abs/2104.00808)<br><br>

[16] Divergence Optimization for Noisy Universal Domain Adaptation(噪声通用域自适应的发散优化)<br>
[paper](https://arxiv.org/abs/2104.00246)<br><br>

[15] Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation(典型的跨域自我监督学习，适用于少拍无监督领域自适应)<br>
[paper](https://arxiv.org/abs/2103.16765) | [project](http://xyue.io/pcs-fuda/index.html)<br><br>

[14] Progressive Domain Expansion Network for Single Domain Generalization(用于单域泛化的渐进域扩展网络)<br>
[paper](https://arxiv.org/abs/2103.16050)<br><br>

[13] Dynamic Domain Adaptation for Efficient Inference(动态域自适应以实现高效推理)<br>
[paper](https://arxiv.org/abs/2103.16403)<br><br>

[12] Adaptive Methods for Real-World Domain Generalization(真实世界域自适应的自适应方法)<br>
[paper](https://arxiv.org/abs/2103.15796)<br><br>

[11] OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations(跨域跨任务表示的可传递性度量标准)<br>
[paper](https://arxiv.org/abs/2103.13843)<br><br>

[10] DRANet: Disentangling Representation and Adaptation Networks for Unsupervised Cross-Domain Adaptation(分解表示和自适应网络以实现无监督的跨域自适应)<br>
[paper](https://arxiv.org/abs/2103.13447)<br><br>

[9] MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation(无监督域自适应的协调域对齐和分类)<br>
[paper](https://arxiv.org/pdf/2103.13575.pdf)<br><br>

[8] Transferable Semantic Augmentation for Domain Adaptation(可转移的语义增强以适应领域)<br>
[paper](https://arxiv.org/abs/2103.12562) | [code](https://github.com/BIT-DA/TSA)<br><br>

[7] Dynamic Transfer for Multi-Source Domain Adaptation(多源域自适应的动态传输)<br>
[paper](https://arxiv.org/abs/2103.10583)<br><br>

[6] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation(基于双层域混合的半监督域自适应语义分割)<br>
[paper](https://arxiv.org/pdf/2103.04705.pdf)<br><br>

[5] Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation(多源领域自适应与协作学习的语义分割)<br>
[paper](https://arxiv.org/abs/2103.04717)<br><br>

[4] Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning(通过域随机化和元学习对视觉表示进行连续调整)<br>
[paper](https://arxiv.org/abs/2012.04324)<br><br>

[3] Domain Generalization via Inference-time Label-Preserving Target Projections(基于推理时间保标目标投影的区域泛化)<br>
[paper](https://arxiv.org/abs/2103.01134)<br><br>

[2] MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive  Sensing(可伸缩的自适应视频压缩传感重建)<br>
[paper](https://arxiv.org/abs/2103.01786) | [code](https://github.com/xyvirtualgroup/MetaSCI-CVPR2021)<br><br>

[1] FSDR: Frequency Space Domain Randomization for Domain Generalization(用于域推广的频域随机化)<br>
[paper](https://arxiv.org/abs/2103.02370)<br><br>


<br><br>

<a name="MetricLearning"/> 

### 度量学习(Metric Learning)

[5] SLADE: A Self-Training Framework For Distance Metric Learning(SLADE：远程度量学习的自训练框架)<br>
[paper](https://arxiv.org/abs/2011.10269)<br><br>

[4] MetricOpt: Learning to Optimize Black-Box Evaluation Metrics(MetricOpt：学习优化黑盒评估指标)<br>
[paper](https://arxiv.org/abs/2104.10631)<br><br>

[3] Noise-resistant Deep Metric Learning with Ranking-based Instance Selection(具有基于排名的实例选择的抗噪深度度量学习)<br>
[paper](https://arxiv.org/abs/2103.16047)<br><br>

[2] Embedding Transfer with Label Relaxation for Improved Metric Learning(嵌入转移与标签松弛功能以改善度量学习)<br>
[paper](https://arxiv.org/abs/2103.14908)<br><br>

[1] Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales(动态度量学习：迈向可扩展的度量空间以适应多个语义尺度)<br>
[paper](https://arxiv.org/pdf/2103.11781.pdf) | [code](https://github.com/SupetZYK/DynamicMetricLearning)<br><br>

<br><br>

<a name="ContrastiveLearning"/> 

## 对比学习(Contrastive Learning)

[4] Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning(具有自监督对比学习的全幻灯片图像分类的双流多实例学习网络)<br>
[paper](https://arxiv.org/abs/2011.08939) | [code](https://github.com/binli123/dsmil-wsi)<br><br>

[3] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification(基于对比学习的混合网络的长尾图像分类)<br>
[paper](https://arxiv.org/abs/2103.14267)<br><br>

[2] AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries(有效对比自我训练的负面对抗无监督表示的对抗性对比)<br>
[paper](https://arxiv.org/abs/2011.08435) | [code](https://github.com/maple-research-lab/AdCo) | [解读-AdCo基于对抗的对比学习](https://mp.weixin.qq.com/s/u7Lhzh8uYEEHfWiM32-4yQ)]
<br><br>

[1] Fine-grained Angular Contrastive Learning with Coarse Labels(粗标签的细粒度角度对比学习)<br>
[paper](https://arxiv.org/abs/2012.03515)<br><br>


<br><br>

<a name="IncrementalLearning"/> 

## 增量学习(Incremental Learning)



[4] Few-Shot Incremental Learning with Continually Evolved Classifiers(借助不断发展的分类器进行少量增量学习)<br>
[paper](https://arxiv.org/abs/2104.03047)<br><br>

[3] DER: Dynamically Expandable Representation for Class Incremental Learning(于类增量学习的动态可扩展表示形式)<br>
[paper](https://arxiv.org/abs/2103.16788)<br><br>

[2] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning(少类别增量学习的语义感知知识蒸馏)<br>
[paper](https://arxiv.org/abs/2103.04059)<br><br>

[1] On Learning the Geodesic Path for Incremental Learning(关于学习增量学习的测地线路径)<br>
[paper](https://arxiv.org/abs/2104.08572)<br><br>


<br><br>
<a name="RL"/> 

## 强化学习(Reinforcement Learning)

[2] Unsupervised Visual Attention and Invariance for Reinforcement Learning(强化学习的无监督视觉注意和不变性)<br>
[paper](https://arxiv.org/abs/2104.02921)<br><br>

[1] Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach(无监督学习以进行稳健拟合：一种强化学习方法)<br>
[paper](https://arxiv.org/abs/2103.03501)<br><br>

<br><br>
<a name="MetaLearning"/> 

## 元学习(Meta Learning)

[4] Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation(集群、拆分、融合和更新：开放复合域自适应语义分割的元学习)<br>
[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.pdf)<br><br>

[3] Faster Meta Update Strategy for Noise-Robust Deep Learning(更快的元更新策略，适用于杂乱无章的深度学习)<br>
[paper](https://arxiv.org/abs/2104.15092)<br><br>

[2] Meta-Mining Discriminative Samples for Kinship Verification(进行亲缘关系验证的元挖掘歧视性样本)<br>
[paper](https://arxiv.org/pdf/2103.15108.pdf)<br><br>

[1] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition(MetaSAug：用于长尾视觉识别的元语义增强)<br>
[paper](https://arxiv.org/pdf/2103.12579.pdf)<br><br>

<br><br>

<a name="100"/> 

## 暂无分类

Rotation-Only Bundle Adjustment(仅旋转束调整)<br>
[paper](https://arxiv.org/abs/2011.11724)<br><br>

HDR Environment Map Estimation for Real-Time Augmented Reality(用于实时增强现实的 HDR 环境地图估计)<br>
[paper](https://arxiv.org/abs/2011.10687) | [video](https://docs-assets.developer.apple.com/ml-research/papers/hdr-environment-map.mp4)<br><br>

Magic Layouts: Structural Prior for Component Detection in User Interface Designs(魔术布局：用户界面设计中组件检测的结构先验)<br>
[paper](https://arxiv.org/abs/2106.07615)<br><br>

Unsupervised Part Discovery via Feature Alignment(通过特征对齐的无监督零件发现)<br>
[paper](https://arxiv.org/abs/2012.00313)<br><br>

Neural Prototype Trees for Interpretable Fine-grained Image Recognition(【可解释性】用于可解释细粒度图像识别的神经原型树)<br>
[paper](https://arxiv.org/abs/2012.02046) | [code](https://github.com/M-Nauta/ProtoTree)<br><br>

How Robust are Randomized Smoothing based Defenses to Data Poisoning?(基于随机平滑的数据中毒防御有多强健？)<br>
[paper](https://arxiv.org/abs/2012.01274)<br><br>

Uncalibrated Neural Inverse Rendering for Photometric Stereo of General Surfaces(一般表面光度立体的未校准神经逆渲染)<br>
[paper](https://arxiv.org/abs/2012.06777)<br><br>

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models(计算机视觉模型中监督和自我监督预训练的彩票假设)<br>
[paper](https://arxiv.org/abs/2012.06908) | [code](https://github.com/VITA-Group/CV_LTH_Pre-training)<br><br>

The Lottery Ticket Hypothesis for Object Recognition(用于对象识别的彩票假设)<br>
[paper](https://arxiv.org/abs/2012.04643)<br><br>

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing(【视频合成】用于视频会议的一次性自由视角神经谈话头合成)<br>
[paper](https://arxiv.org/abs/2011.15126) | [project](https://nvlabs.github.io/face-vid2vid)<br><br>

Stochastic Image-to-Video Synthesis using cINNs(【视频合成】使用cINN的随机图像到视频合成)<br>
[paper](https://arxiv.org/abs/2105.04551) | [project](https://bit.ly/3t66bnU)<br>

NeRD: Neural 3D Reflection Symmetry Detector(NeRD：神经3D反射对称检测器)<br>
[paper](https://arxiv.org/abs/2105.03211) | [code](https://github.com/zhou13/nerd)<br><br>

Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors(【人体体积捕获】Function4D：从非常稀疏的消费类RGBD传感器实时采集人体体积)<br>
[paper](https://arxiv.org/abs/2105.01859) | [project](http://www.liuyebin.com/Function4D/Function4D.html) | [video](https://www.youtube.com/watch?v=-rWUn4fEQNU&t=126s)<br>

AutoFlow: Learning a Better Training Set for Optical Flow(AutoFlow：学习更好的光流训练集)<br>
[paper](https://arxiv.org/abs/2104.14544) | [code](https://autoflow-google.gi

Download .txt

gitextract_xntj19ao/

├── CVPR2020.md
├── CVPR2021.md
├── CVPR2022.md
├── CVPR2023.md
├── CVPR2024.md
├── README.md
├── cvpr2018-githublinks.csv
├── cvpr_2019_githublinks.csv
├── cvpr_2019_oral.csv
└── cvpr_2019_poster.csv

Download .json

Condensed preview — 10 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,568K chars).

[
  {
    "path": "CVPR2020.md",
    "chars": 32848,
    "preview": "\r\n* 推荐阅读：<br>\r\n  * [2020-2021年计算机视觉综述论文汇总](https://github.com/extreme-assistant/survey-computer-vision)<br>\r\n  * [2019-2"
  },
  {
    "path": "CVPR2021.md",
    "chars": 220826,
    "preview": "* 推荐阅读：<br>\n  * [ICCV2021/2019/2017 论文/代码/解读/直播合集](https://github.com/extreme-assistant/ICCV2021-Paper-Code-Interpretati"
  },
  {
    "path": "CVPR2022.md",
    "chars": 132681,
    "preview": "* 推荐阅读：<br>\n  * [ICCV2021/2019/2017 论文/代码/解读/直播合集](https://github.com/extreme-assistant/ICCV2021-Paper-Code-Interpretati"
  },
  {
    "path": "CVPR2023.md",
    "chars": 169745,
    "preview": "* 推荐阅读：<br>\n  * [ICCV2021/2019/2017 论文/代码/解读/直播合集](https://github.com/extreme-assistant/ICCV2021-Paper-Code-Interpretati"
  },
  {
    "path": "CVPR2024.md",
    "chars": 1760,
    "preview": "* 推荐阅读：<br>\n  * [ICCV2021/2019/2017 论文/代码/解读/直播合集](https://github.com/extreme-assistant/ICCV2021-Paper-Code-Interpretati"
  },
  {
    "path": "README.md",
    "chars": 8077,
    "preview": "\n* 推荐阅读：<br>\n  * [ICCV2021/2019/2017 论文/代码/解读/直播合集](https://github.com/extreme-assistant/ICCV2021-Paper-Code-Interpretat"
  },
  {
    "path": "cvpr2018-githublinks.csv",
    "chars": 221402,
    "preview": "Paper ID,Type,Title,github,Author(s),project\r\n2709,Poster,Feature Generating Networks for Zero-Shot Learning,http://dat"
  },
  {
    "path": "cvpr_2019_githublinks.csv",
    "chars": 217410,
    "preview": "No,title,github/code,paper reading article,project link,Catorgory,Oral day,Oral session,Oral time,Poster day,Poster sess"
  },
  {
    "path": "cvpr_2019_oral.csv",
    "chars": 98021,
    "preview": "Paper ID,Paper Title,Author Names,Primary Subject Area,Secondary Subject Areas,Status,Day.Time.Track,Group,Topic,,,Track"
  },
  {
    "path": "cvpr_2019_poster.csv",
    "chars": 439537,
    "preview": "Paper ID,Paper Title,Author Names,Primary Subject Area,Secondary Subject Areas,Status,Day.Time,Poster #,Topic,,Topic Leg"
  }
]

About this extraction

This page contains the full source code of the extreme-assistant/CVPR2024-Paper-Code-Interpretation GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 10 files (1.5 MB), approximately 459.2k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo