Repository: sbrugman/deep-learning-papers Branch: master Commit: 358e237239de Files: 3 Total size: 49.4 KB Directory structure: gitextract_a3mjjkf_/ ├── .gitignore ├── README.md └── task-papers.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .idea/* ================================================ FILE: README.md ================================================ # Deep Learning Papers by task Papers about deep learning ordered by task, date. For each paper there is a permanent link, which is either to Arxiv.org or to a copy of the original paper in this repository. # Table of Contents 1. [Code](#code) 1.1. [Code Generation](#code-generation) 1.2. [Malware Detection and Security](#malware-detection-and-security) 2. [Text](#text) 2.1. [Summarization](#summarization) 2.2. [Taskbots](#taskbots) 2.3. [Classification](#classification) 2.4. [Question Answering](#question-answering) 2.5. [Sentiment Analysis](#sentiment-analysis) 2.6. [Translation](#translation) 2.7. [Chatbots](#chatbots) 2.8. [Reasoning](#reasoning) 2.9. [Language Representation](#language-representation) 3. [Visual](#visual) 3.1. [Gaming](#gaming) 3.2. [Style Transfer](#style-transfer) 3.3. [Object Tracking](#object-tracking) 3.4. [Visual Question Answering](#visual-question-answering) 3.5. [Image Segmentation](#image-segmentation) 3.6. [Text (in the Wild) Recognition](#text-in-the-wild-recognition) 3.7. [Brain Computer Interfacing](#brain-computer-interfacing) 3.8. [Self-Driving Cars](#self-driving-cars) 3.9. [Object Recognition](#object-recognition) 3.10. [Logo Recognition](#logo-recognition) 3.11. [Super Resolution](#super-resolution) 3.12. [Pose Estimation](#pose-estimation) 3.13. [Image Captioning](#image-captioning) 3.14. [Image Compression](#image-compression) 3.15. [Image Synthesis](#image-synthesis) 3.16. [Face Recognition](#face-recognition) 3.17. [Image Composition](#image-composition) 3.18. [Scene Graph Parsing](#scene-graph-parsing) 3.19. [Video Deblurring](#video-deblurring) 3.20. [Depth Perception](#depth-perception) 3.21. [3D Reconstruction](#3d-reconstruction) 3.22. [Vision Representation](#vision-representation) 4. [Audio](#audio) 4.1. [Audio Synthesis](#audio-synthesis) 5. [Other](#other) 5.1. [Unclassified](#unclassified) 5.2. [Regularization](#regularization) 5.3. [Neural Network Compression](#neural-network-compression) 5.4. [Optimizers](#optimizers) ## Code ### Code Generation |Title|Date|Paper|Code| |---|---|---|---| | DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning | _25 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.07734) | | | A Syntactic Neural Model for General-Purpose Code Generation | _6 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.01696) | | | RobustFill: Neural Program Learning under Noisy I/O | _21 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.07469) | | | DeepFix: Fixing Common C Language Errors by Deep Learning | _12 feb 2017_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/deepfix-fixing-common-c-language-errors-by-deep-learning.pdf) | | | DeepCoder: Learning to Write Programs | _7 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.01989) | | | Neuro-Symbolic Program Synthesis | _6 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.01855) | | | Deep API Learning | _27 may 2016_ | [arxiv](https://arxiv.org/pdf/1605.08535) | | ### Malware Detection and Security |Title|Date|Paper|Code| |---|---|---|---| | PassGAN: A Deep Learning Approach for Password Guessing | _1 sep 2017_ | [arxiv](https://arxiv.org/pdf/1709.00440) | | | Deep Android Malware Detection | _22 mar 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/deep-android-malware-detection.pdf) | [github](https://github.com/niallmcl/Deep-Android-Malware-Detection) | | Droid-Sec: Deep Learning in Android Malware Detection | _17 aug 2014_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/droid-sec-deep-learning-in-android-malware-detection.pdf) | [github](https://github.com/pjlantz/droidbox) | ## Text ### Summarization |Title|Date|Paper|Code| |---|---|---|---| | A Deep Reinforced Model for Abstractive Summarization | _11 may 2017_ | [arxiv](https://arxiv.org/pdf/1705.04304) | | | Get To The Point: Summarization with Pointer-Generator Networks | _14 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.04368) | | | SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents | _14 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.04230) | | ### Taskbots |Title|Date|Paper|Code| |---|---|---|---| | Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning | _10 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03084) | [github](https://github.com/MiuLab/TC-Bot) | | End-to-End Task-Completion Neural Dialogue Systems | _3 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.01008) | [github](https://github.com/MiuLab/TC-Bot) | ### Classification |Title|Date|Paper|Code| |---|---|---|---| | A Large Self-Annotated Corpus for Sarcasm | _19 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.05579) | | | ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge | _11 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03560) | | | Bilateral Multi-Perspective Matching for Natural Language Sentences | _13 feb 2017_ | [arxiv](https://arxiv.org/pdf/1702.03814) | | | FastText.zip: Compressing text classification models | _12 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.03651) | | | ConceptNet 5.5: An Open Multilingual Graph of General Knowledge | _12 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.03975) | | | A Simple but Tough-to-Beat Baseline for Sentence Embeddings | _4 nov 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/a-simple-but-tough-to-beat-baseline-for-sentence-embeddings.pdf) | [github](https://github.com/YingyuLiang/SIF) | | Enriching Word Vectors with Subword Information | _15 jul 2016_ | [arxiv](https://arxiv.org/pdf/1607.04606) | | | From Word Embeddings To Document Distances | _6 jul 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/from-word-embeddings-to-document-distances.pdf) | [github](https://github.com/mkusner/wmd) | | Bag of Tricks for Efficient Text Classification | _6 jul 2016_ | [arxiv](https://arxiv.org/pdf/1607.01759) | | | Character-level Convolutional Networks for Text Classification | _4 sep 2015_ | [arxiv](https://arxiv.org/pdf/1509.01626) | | | GloVe: Global Vectors for Word Representation | _25 may 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/glove-global-vectors-for-word-representation.pdf) | [github](https://github.com/stanfordnlp/GloVe) | | Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks | _28 feb 2015_ | [arxiv](https://arxiv.org/pdf/1503.00075) | | | Distributed Representations of Sentences and Documents | _16 may 2014_ | [arxiv](https://arxiv.org/pdf/1405.4053) | | | Efficient Estimation of Word Representations in Vector Space | _16 jan 2013_ | [arxiv](https://arxiv.org/pdf/1301.3781) | | | SimHash: Hash-based Similarity Detection | _13 dec 2007_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/simhash-hash-based-similarity-detection.pdf) | | ### Question Answering |Title|Date|Paper|Code| |---|---|---|---| | IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models | _30 may 2017_ | [arxiv](https://arxiv.org/pdf/1705.10513) | [github](https://github.com/geek-ai/irgan) | ### Sentiment Analysis |Title|Date|Paper|Code| |---|---|---|---| | Rationalizing Neural Predictions | _13 jun 2016_ | [arxiv](https://arxiv.org/pdf/1606.04155) | | | Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank | _18 okt 2013_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/recursive-deep-models-for-semantic-compositionality-over-a-sentiment-treebank.pdf) | | ### Translation |Title|Date|Paper|Code| |---|---|---|---| | Attention Is All You Need | _12 jun 2017_ | [arxiv](https://arxiv.org/pdf/1706.03762) | | | Convolutional Sequence to Sequence Learning | _8 may 2017_ | [arxiv](https://arxiv.org/pdf/1705.03122) | [github](https://github.com/facebookresearch/fairseq) | | Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation | _14 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.04558) | | | A Convolutional Encoder Model for Neural Machine Translation | _7 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.02344) | | | Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation | _26 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.08144) | | | Neural Machine Translation by Jointly Learning to Align and Translate | _1 sep 2014_ | [arxiv](https://arxiv.org/pdf/1409.0473) | | ### Chatbots |Title|Date|Paper|Code| |---|---|---|---| | A Deep Reinforcement Learning Chatbot | _7 sep 2017_ | [arxiv](https://arxiv.org/pdf/1709.02349) | | | A Neural Conversational Model | _19 jun 2015_ | [arxiv](https://arxiv.org/pdf/1506.05869) | [github](https://github.com/inikdom/neural-chatbot) | ### Reasoning |Title|Date|Paper|Code| |---|---|---|---| | NeuroSAT: Learning a SAT Solver from Single-Bit Supervision | _5 jan 2019_ | [arxiv](https://arxiv.org/pdf/1802.03685.pdf) | [github](https://github.com/dselsam/neurosat) | | Tracking the World State with Recurrent Entity Networks | _12 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.03969) | | ### Language Representation |Title|Date|Paper|Code| |---|---|---|---| | Efficient Estimation of Word Representations in Vector Space | _7 sep 2013_ | [arxiv](https://arxiv.org/pdf/1301.3781.pdf) | | | Distributed Representations of Words and Phrases and their Compositionality | _16 okt 2013_ | [arxiv](https://arxiv.org/pdf/1310.4546.pdf) | | ELMO: Deep contextualized word representations | _22 Mar 2018_ | [arxiv](https://arxiv.org/pdf/1802.05365.pdf) | | | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | _24 may 2019_ | [arxiv](https://arxiv.org/pdf/1810.04805.pdf) | [github](https://github.com/google-research/bert) | | XLNet: Generalized Autoregressive Pretraining for Language Understanding | _19 jun 2019_ | [arxiv](https://arxiv.org/pdf/1906.08237.pdf) | [github](https://github.com/zihangdai/xlnet) | | RoBERTa: A Robustly Optimized BERT Pretraining Approach | _26 jul 2019_ | [arxiv](https://arxiv.org/pdf/1907.11692.pdf) | [github](https://github.com/pytorch/fairseq/tree/master/examples/roberta) | ## Visual ### Gaming |Title|Date|Paper|Code| |---|---|---|---| | Phase-Functioned Neural Networks for Character Control | _1 may 2017_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/phase-functioned-neural-networks-for-character-control.pdf) | | | Equivalence Between Policy Gradients and Soft Q-Learning | _21 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.06440) | | | Beating Atari with Natural Language Guided Reinforcement Learning | _18 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.05539) | | | Learning from Demonstrations for Real World Reinforcement Learning | _12 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03732) | | | FeUdal Networks for Hierarchical Reinforcement Learning | _3 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.01161) | | | Overcoming catastrophic forgetting in neural networks | _2 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.00796) | | | Playing Doom with SLAM-Augmented Deep Reinforcement Learning | _1 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.00380) | | | Playing FPS Games with Deep Reinforcement Learning | _18 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.05521) | | | DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess | _16 aug 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/deepchess-end-to-end-deep-neural-network-for-automatic-learning-in-chess.pdf) | | | Generative Adversarial Imitation Learning | _10 jun 2016_ | [arxiv](https://arxiv.org/pdf/1606.03476) | | | Dueling Network Architectures for Deep Reinforcement Learning | _20 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.06581) | | | Prioritized Experience Replay | _18 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.05952) | | | Human-level control through deep reinforcement learning | _26 feb 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/human-level-control-through-deep-reinforcement-learning.pdf) | | | Playing Atari with Deep Reinforcement Learning | _19 dec 2013_ | [arxiv](https://arxiv.org/pdf/1312.5602) | | ### Style Transfer |Title|Date|Paper|Code| |---|---|---|---| | The Contextual Loss for Image Transformation with Non-Aligned Data | _18 jul 2018_ | [arxiv](https://arxiv.org/pdf/1803.02077.pdf) | [github](https://github.com/roimehrez/contextualLoss) | | Deep Photo Style Transfer | _22 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.07511) | | | Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization | _20 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.06868) | [github](https://github.com/xunhuang1995/AdaIN-style) | | A Learned Representation For Artistic Style | _24 okt 2016_ | [arxiv](https://arxiv.org/pdf/1610.07629) | | | Instance Normalization: The Missing Ingredient for Fast Stylization | _27 jul 2016_ | [arxiv](https://arxiv.org/pdf/1607.08022) | | | Perceptual Losses for Real-Time Style Transfer and Super-Resolution | _27 mar 2016_ | [arxiv](https://arxiv.org/pdf/1603.08155) | [github](http://github.com/jcjohnson/fast-neural-style) | | A Neural Algorithm of Artistic Style | _26 aug 2015_ | [arxiv](https://arxiv.org/pdf/1508.06576) | [github](https://github.com/lengstrom/fast-style-transfer/) | ### Object Tracking |Title|Date|Paper|Code| |---|---|---|---| | End-to-end representation learning for Correlation Filter based tracking | _20 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.06036) | [github](https://github.com/bertinetto/cfnet) | ### Visual Question Answering |Title|Date|Paper|Code| |---|---|---|---| | VQA: Visual Question Answering | _3 may 2015_ | [arxiv](https://arxiv.org/pdf/1505.00468) | | ### Image Segmentation |Title|Date|Paper|Code| |---|---|---|---| | PointRend: Image Segmentation as Rendering | _17 dec 2019_ | [arxiv](https://arxiv.org/pdf/1912.08193.pdf) | | | Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation | _22 aug 2018_ | [paper](https://arxiv.org/pdf/1802.02611.pdf) [cvpr](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf) | [github](https://github.com/tensorflow/models/tree/master/research/deeplab) | | Dilated Residual Networks | _22 jul 2017_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/dilated-residual-networks.pdf) | | | SfM-Net: Learning of Structure and Motion from Video | _25 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.07804) | | | Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network | _28 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.09695) | | | Mask R-CNN | _20 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.06870) | | | Learning Features by Watching Objects Move | _19 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.06370) | | | RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation | _20 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.06612) | [github](https://github.com/guosheng/refinenet) | | UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory | _7 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.02132) | | | DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs | _2 jun 2016_ | [arxiv](https://arxiv.org/pdf/1606.00915) | | | Fully Convolutional Networks for Semantic Segmentation | _20 may 2016_ | [arxiv](https://arxiv.org/pdf/1605.06211) | [github](https://github.com/shelhamer/fcn.berkeleyvision.org) | | Instance-aware Semantic Segmentation via Multi-task Network Cascades | _14 dec 2015_ | [arxiv](https://arxiv.org/pdf/1512.04412) | | | Multi-Scale Context Aggregation by Dilated Convolutions | _23 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.07122) | | | SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation | _2 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.00561) | | | U-Net: Convolutional Networks for Biomedical Image Segmentation | _18 may 2015_ | [arxiv](https://arxiv.org/pdf/1505.04597) | | | Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs | _22 dec 2014_ | [arxiv](https://arxiv.org/pdf/1412.7062) | | | Learning Rich Features from RGB-D Images for Object Detection and Segmentation | _22 jul 2014_ | [arxiv](https://arxiv.org/pdf/1407.5736) | | ### Text (in the Wild) Recognition |Title|Date|Paper|Code| |---|---|---|---| | OCR Error Correction Using Character Correction and Feature-Based Word Classification | _21 apr 2016_ | [arxiv](https://arxiv.org/pdf/1604.06225) | | | Recursive Recurrent Nets with Attention Modeling for OCR in the Wild | _9 mar 2016_ | [arxiv](https://arxiv.org/pdf/1603.03101) | | | COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images | _26 jan 2016_ | [arxiv](https://arxiv.org/pdf/1601.07140) | | | Efficient Scene Text Localization and Recognition with Local Character Refinement | _14 apr 2015_ | [arxiv](https://arxiv.org/pdf/1504.03522) | | | Reading Text in the Wild with Convolutional Neural Networks | _4 dec 2014_ | [arxiv](https://arxiv.org/pdf/1412.1842) | | | Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition | _9 jun 2014_ | [arxiv](https://arxiv.org/pdf/1406.2227) | | ### Brain Computer Interfacing |Title|Date|Paper|Code| |---|---|---|---| | Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG | _15 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.05051) | | | Encoding Voxels with Deep Learning | _2 dec 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/encoding-voxels-with-deep-learning.pdf) | | | Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream | _8 jul 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/deep-neural-networks-reveal-a-gradient-in-the-complexity-of-neural-representations-across-the-ventral-stream.pdf) | | ### Self-Driving Cars |Title|Date|Paper|Code| |---|---|---|---| | Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art | _18 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.05519) | | | End to End Learning for Self-Driving Cars | _25 apr 2016_ | [arxiv](https://arxiv.org/pdf/1604.07316) | | ### Object Recognition |Title|Date|Paper|Code| |---|---|---|---| | Cascade R-CNN: High Quality Object Detection and Instance Segmentation | _24 Jun 2019_ | [arxiv](https://arxiv.org/pdf/1906.09756.pdf) | [github](https://github.com/zhaoweicai/cascade-rcnn) | | YOLOv3: An Incremental Improvement | _8 Apr 2018_ | [arxiv](https://arxiv.org/pdf/1804.02767.pdf) | [github]( https://github.com/pjreddie/darknet), [github reimplementation](https://github.com/ultralytics/yolov3) | | Focal Loss for Dense Object Detection | _7 aug 2017_ | [arxiv](https://arxiv.org/pdf/1708.02002) | | | Introspective Classifier Learning: Empower Generatively | _25 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.07816) | | | Learning Chained Deep Features and Classifiers for Cascade in Object Detection | _23 feb 2017_ | [arxiv](https://arxiv.org/pdf/1702.07054) | | | DSSD : Deconvolutional Single Shot Detector | _23 jan 2017_ | [arxiv](https://arxiv.org/pdf/1701.06659) | | | YOLO9000: Better, Faster, Stronger | _25 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.08242) | [github](https://github.com/pjreddie/darknet) | | Feature Pyramid Networks for Object Detection | _9 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.03144) | | | Speed/accuracy trade-offs for modern convolutional object detectors | _30 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.10012) | | | Aggregated Residual Transformations for Deep Neural Networks | _16 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.05431) | | | Aggregated Residual Transformations for Deep Neural Networks | _16 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.05431) | | | Hierarchical Object Detection with Deep Reinforcement Learning | _11 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.03718) | | | Xception: Deep Learning with Depthwise Separable Convolutions | _7 okt 2016_ | [arxiv](https://arxiv.org/pdf/1610.02357) | | | Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition | _1 okt 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/learning-to-make-better-mistakes-semantics-aware-visual-food-recognition.pdf) | | | Densely Connected Convolutional Networks | _25 aug 2016_ | [arxiv](https://arxiv.org/pdf/1608.06993) | | | Residual Networks of Residual Networks: Multilevel Residual Networks | _9 aug 2016_ | [arxiv](https://arxiv.org/pdf/1608.02908) | | | Context Matters: Refining Object Detection in Video with Recurrent Neural Networks | _15 jul 2016_ | [arxiv](https://arxiv.org/pdf/1607.04648) | | | R-FCN: Object Detection via Region-based Fully Convolutional Networks | _20 may 2016_ | [arxiv](https://arxiv.org/pdf/1605.06409) | | | Training Region-based Object Detectors with Online Hard Example Mining | _12 apr 2016_ | [arxiv](https://arxiv.org/pdf/1604.03540) | | | T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos | _9 apr 2016_ | [arxiv](https://arxiv.org/pdf/1604.02532) | | | Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning | _23 feb 2016_ | [arxiv](https://arxiv.org/pdf/1602.07261) | | | Deep Residual Learning for Image Recognition | _10 dec 2015_ | [arxiv](https://arxiv.org/pdf/1512.03385) | | | SSD: Single Shot MultiBox Detector | _8 dec 2015_ | [arxiv](https://arxiv.org/pdf/1512.02325) | | | Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) | _23 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.07289) | | | ParseNet: Looking Wider to See Better | _15 jun 2015_ | [arxiv](https://arxiv.org/pdf/1506.04579) | | | You Only Look Once: Unified, Real-Time Object Detection | _8 jun 2015_ | [arxiv](https://arxiv.org/pdf/1506.02640) | | | Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks | _4 jun 2015_ | [arxiv](https://arxiv.org/pdf/1506.01497) | | | Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification | _6 feb 2015_ | [arxiv](https://arxiv.org/pdf/1502.01852) | | | Deep Image: Scaling up Image Recognition | _13 jan 2015_ | [arxiv](https://arxiv.org/pdf/1501.02876) | | | Rich feature hierarchies for accurate object detection and semantic segmentation | _11 nov 2013_ | [arxiv](https://arxiv.org/pdf/1311.2524) | | | Selective Search for Object Recognition | _11 mar 2013_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/selective-search-for-object-recognition.pdf) | | | ImageNet Classification with Deep Convolutional Neural Networks | _3 dec 2012_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/imagenet-classification-with-deep-convolutional-neural-networks.pdf) | | ### Logo Recognition |Title|Date|Paper|Code| |---|---|---|---| | Deep Learning Logo Detection with Data Expansion by Synthesising Context | _29 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.09322) | | | Automatic Graphic Logo Detection via Fast Region-based Convolutional Networks | _20 apr 2016_ | [arxiv](https://arxiv.org/pdf/1604.06083) | | | LOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with Deep Region-based Convolutional Networks | _8 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.02462) | | | DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer | _7 okt 2015_ | [arxiv](https://arxiv.org/pdf/1510.02131) | | ### Super Resolution |Title|Date|Paper|Code| |---|---|---|---| | Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network | _16 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.05158) | | | Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network | _15 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.04802) | | | RAISR: Rapid and Accurate Image Super Resolution | _3 jun 2016_ | [arxiv](https://arxiv.org/pdf/1606.01299) | | | Perceptual Losses for Real-Time Style Transfer and Super-Resolution | _27 mar 2016_ | [arxiv](https://arxiv.org/pdf/1603.08155) | [github](http://github.com/jcjohnson/fast-neural-style) | | Image Super-Resolution Using Deep Convolutional Networks | _31 dec 2014_ | [arxiv](https://arxiv.org/pdf/1501.00092) | | ### Pose Estimation |Title|Date|Paper|Code| |---|---|---|---| | Forecasting Human Dynamics from Static Images | _11 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03432) | | | Fast Single Shot Detection and Pose Estimation | _19 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.05590) | | | Convolutional Pose Machines | _30 jan 2016_ | [arxiv](https://arxiv.org/pdf/1602.00134) | | | Flowing ConvNets for Human Pose Estimation in Videos | _9 jun 2015_ | [arxiv](https://arxiv.org/pdf/1506.02897) | | ### Image Captioning |Title|Date|Paper|Code| |---|---|---|---| | Actor-Critic Sequence Training for Image Captioning | _29 jun 2017_ | [arxiv](https://arxiv.org/pdf/1706.09601) | | | Detecting and Recognizing Human-Object Interactions | _24 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.07333) | | | Deep Reinforcement Learning-based Image Captioning with Embedding Reward | _12 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03899) | | | Towards Diverse and Natural Image Descriptions via a Conditional GAN | _17 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.06029) | | | Temporal Tessellation: A Unified Approach for Video Analysis | _21 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.06950) | [github](https://github.com/dot27/temporal-tessellation) | | Self-critical Sequence Training for Image Captioning | _2 dec 2016_ | [arxiv](https://arxiv.org/pdf/1612.00563) | | | Generation and Comprehension of Unambiguous Object Descriptions | _7 nov 2015_ | [arxiv](https://arxiv.org/pdf/1511.02283) | | | Show, Attend and Tell: Neural Image Caption Generation with Visual Attention | _10 feb 2015_ | [arxiv](https://arxiv.org/pdf/1502.03044) | | | Long-term Recurrent Convolutional Networks for Visual Recognition and Description | _17 nov 2014_ | [arxiv](https://arxiv.org/pdf/1411.4389) | | ### Image Compression |Title|Date|Paper|Code| |---|---|---|---| | Full Resolution Image Compression with Recurrent Neural Networks | _18 aug 2016_ | [arxiv](https://arxiv.org/pdf/1608.05148) | | ### Image Synthesis |Title|Date|Paper|Code| |---|---|---|---| | Scene Text Synthesis for Efficient and Effective Deep Network Training | _26 jan 2019_ | [arxiv](https://arxiv.org/pdf/1901.09193.pdf) | | | A Neural Representation of Sketch Drawings | _11 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03477) | | | BEGAN: Boundary Equilibrium Generative Adversarial Networks | _31 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.10717) | [github](https://github.com/carpedm20/BEGAN-tensorflow) | | Improved Training of Wasserstein GANs | _31 mar 2017_ | [arxiv](https://arxiv.org/pdf/1704.00028) | | | Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks | _30 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.10593) | [github](https://github.com/junyanz/CycleGAN) | | Wasserstein GAN | _26 jan 2017_ | [arxiv](https://arxiv.org/pdf/1701.07875) | | | RenderGAN: Generating Realistic Labeled Data | _4 nov 2016_ | [arxiv](https://arxiv.org/pdf/1611.01331) | | | Conditional Image Generation with PixelCNN Decoders | _16 jun 2016_ | [arxiv](https://arxiv.org/pdf/1606.05328) | | | Pixel Recurrent Neural Networks | _25 jan 2016_ | [arxiv](https://arxiv.org/pdf/1601.06759) | | | Generative Adversarial Networks | _10 jun 2014_ | [arxiv](https://arxiv.org/pdf/1406.2661) | | ### Face Recognition |Title|Date|Paper|Code| |---|---|---|---| | Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition | _24 okt 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/accessorize-to-a-crime-real-and-stealthy-attacks-on-state-of-the-art-face-recognition.pdf) | | | OpenFace: A general-purpose face recognition library with mobile applications | _1 jun 2016_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/openface-a-general-purpose-face-recognition-library-with-mobile-applications.pdf) | | | Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns | _9 nov 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/emotion-recognition-in-the-wild-via-convolutional-neural-networks-and-mapped-binary-patterns.pdf) | | | Deep Face Recognition | _7 sep 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/deep-face-recognition.pdf) | | | Compact Convolutional Neural Network Cascade for Face Detection | _6 aug 2015_ | [arxiv](https://arxiv.org/pdf/1508.01292) | | | Learning Robust Deep Face Representation | _17 jul 2015_ | [arxiv](https://arxiv.org/pdf/1507.04844) | | | Facenet: A unified embedding for face recognition and clustering | _12 jun 2015_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/facenet-a-unified-embedding-for-face-recognition-and-clustering.pdf) | | | Multi-view Face Detection Using Deep Convolutional Neural Networks | _10 feb 2015_ | [arxiv](https://arxiv.org/pdf/1502.02766) | | ### Image Composition |Title|Date|Paper|Code| |---|---|---|---| | Auto-Retoucher(ART) — A Framework for Background Replacement and Foreground Adjustment | _13 jan 2019_ | [arxiv](https://arxiv.org/pdf/1901.03954.pdf) (brave new task) | [github](https://github.com/woshiyyya/Auto-Retoucher-pytorch) (not able to reproduce results based on code) | | Spatial Fusion GAN for Image Synthesis | _14 Dec 2018_ | [arxiv](https://arxiv.org/pdf/1812.05840.pdf) (needs revision, interesting approach however) | [github](https://github.com/fnzhan/SF-GAN) (currently, no code available) | | Compositional GAN: Learning Conditional Image Composition | _23 Aug 2018_ | [arxiv](https://arxiv.org/pdf/1807.07560.pdf) (with respect to spatial orientation) | [github](https://github.com/azadis/CompositionalGAN) (currently, no code available) | | ST-GAN | _5 mar 2018_ | [arxiv](https://arxiv.org/pdf/1803.01837) (with respect to spatial orientation) | [github](https://github.com/chenhsuanlin/spatial-transformer-GAN) | | Deep Painterly Harmonization | _26 Jun 2018_ | [paper](https://arxiv.org/pdf/1804.03189.pdf) | [github](https://github.com/luanfujun/deep-painterly-harmonization) | | Deep Image Harmonization | _28 feb 2017_ | [paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Tsai_Deep_Image_Harmonization_CVPR_2017_paper.pdf) | [github](https://github.com/wasidennis/DeepHarmonization) (only code for inference) | | Understanding and Improving the Realism of Image Composites | _1 Jul 2012_ | [paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.682.3987&rep=rep1&type=pdf) | | ### Scene Graph Parsing |Title|Date|Paper|Code| |---|---|---|---| | Neural Motifs: Scene Graph Parsing with Global Context | _29 Mar 2018_ | [arxiv](https://arxiv.org/pdf/1711.06640.pdf) | [github](https://github.com/rowanz/neural-motifs) | ### Video Deblurring |Title|Date|Paper|Code| |---|---|---|---| | Spatio-Temporal Filter Adaptive Network for Video Deblurring | _28 Apr 2019_ | [arxiv](https://arxiv.org/pdf/1904.12257.pdf) | [github](https://shangchenzhou.com/projects/stfan/) (to appear) | ### Depth Perception |Title|Date|Paper|Code| |---|---|---|---| | Learning Depth with Convolutional Spatial Propagation Network | _13 Okt 2018_ | [arxiv](https://arxiv.org/pdf/1810.02695.pdf) | [github](https://github.com/XinJCheng/CSPN) | | Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches | _18 May 2016_ | [arxiv](https://arxiv.org/pdf/1510.05970.pdf) | [github](https://github.com/jzbontar/mc-cnn) | ### 3D Reconstruction |Title|Date|Paper|Code| |---|---|---|---| | Cerberus: A Multi-headed Derenderer | _28 May 2019_ | [arxiv](https://arxiv.org/pdf/1905.11940.pdf) | | ### Vision Representation |Title|Date|Paper|Code| |---|---|---|---| | VisualBERT: A Simple and Performant Baseline for Vision and Language | _9 aug 2019_ | [arxiv](https://arxiv.org/pdf/1908.03557.pdf) | | | Expected to appear: some paper learning an unsupervised vision representation that beats SOTA on a large number of tasks | | | | ## Audio ### Audio Synthesis |Title|Date|Paper|Code| |---|---|---|---| | Deep Cross-Modal Audio-Visual Generation | _26 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.08292) | | | A Neural Parametric Singing Synthesizer | _12 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.03809) | | | Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders | _5 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.01279) | [github](https://github.com/tensorflow/magenta/tree/master/magenta/models/nsynth) | | Tacotron: Towards End-to-End Speech Synthesis | _29 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.10135) | [github](https://github.com/Kyubyong/tacotron) | | Deep Voice: Real-time Neural Text-to-Speech | _25 feb 2017_ | [arxiv](https://arxiv.org/pdf/1702.07825) | | | WaveNet: A Generative Model for Raw Audio | _12 sep 2016_ | [arxiv](https://arxiv.org/pdf/1609.03499) | [github](https://github.com/ibab/tensorflow-wavenet) | ## Other ### Unclassified |Title|Date|Paper|Code| |---|---|---|---| | A simple neural network module for relational reasoning | _5 jun 2017_ | [arxiv](https://arxiv.org/pdf/1706.01427) | | | Deep Complex Networks | _27 may 2017_ | [arxiv](https://arxiv.org/pdf/1705.09792) | [github](https://github.com/ChihebTrabelsi/deep_complex_networks) | | Learning to Fly by Crashing | _19 apr 2017_ | [arxiv](https://arxiv.org/pdf/1704.05588) | | | Who Said What: Modeling Individual Labelers Improves Classification | _26 mar 2017_ | [arxiv](https://arxiv.org/pdf/1703.08774) | | | Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data | _18 okt 2016_ | [arxiv](https://arxiv.org/pdf/1610.05755) | | | DeepMath - Deep Sequence Models for Premise Selection | _14 jun 2016_ | [arxiv](https://arxiv.org/pdf/1606.04442) | | | Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue | _16 mar 2016_ | [arxiv](https://arxiv.org/pdf/1603.04992) | | | Long Short-Term Memory | _15 nov 1997_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/long-short-term-memory.pdf) | | ### Regularization |Title|Date|Paper|Code| |---|---|---|---| | Self-Normalizing Neural Networks | _8 jun 2017_ | [arxiv](https://arxiv.org/pdf/1706.02515) | | | Concrete Dropout | _22 may 2017_ | [arxiv](https://arxiv.org/pdf/1705.07832) | [github](https://github.com/yaringal/ConcreteDropout) | | Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning | _6 jun 2015_ | [arxiv](https://arxiv.org/pdf/1506.02142) | | | Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift | _11 feb 2015_ | [arxiv](https://arxiv.org/pdf/1502.03167) | | ### Neural Network Compression |Title|Date|Paper|Code| |---|---|---|---| | Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure | _15 aug 2016_ | [arxiv](https://arxiv.org/pdf/1608.04337) | | | SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size | _24 feb 2016_ | [arxiv](https://arxiv.org/pdf/1602.07360) | | | Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding | _1 okt 2015_ | [arxiv](https://arxiv.org/pdf/1510.00149) | | ### Optimizers |Title|Date|Paper|Code| |---|---|---|---| | Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour | _8 jun 2017_ | [arxiv](https://arxiv.org/pdf/1706.02677) | | | Equilibrated adaptive learning rates for non-convex optimization | _15 feb 2015_ | [arxiv](https://arxiv.org/pdf/1502.04390) | | | Adam: A Method for Stochastic Optimization | _22 dec 2014_ | [arxiv](https://arxiv.org/pdf/1412.6980) | | | Deep learning with Elastic Averaging SGD | _20 dec 2014_ | [arxiv](https://arxiv.org/pdf/1412.6651) | | | ADADELTA: An Adaptive Learning Rate Method | _22 dec 2012_ | [arxiv](https://arxiv.org/pdf/1212.5701) | | | Advances in Optimizing Recurrent Networks | _4 dec 2012_ | [arxiv](https://arxiv.org/pdf/1212.0901) | | | Efficient Backprop | _1 jul 1998_ | [paper](https://github.com/sbrugman/deep-learning-papers/raw/master/papers/efficient-backprop.pdf) | | ## A note on arXiv arXiv provides the world with access to the newest scientific developments. Open Access has a myriad of benefits, in particular, it allows science to be more efficient. Remember to think about the quality of the papers referenced. In particular, the importance of the [peer-review process](https://undsci.berkeley.edu/article/howscienceworks_16) for science. If you find an article on arXiv you should check if it has been peer-reviewed and published elsewhere. The authoritative version of the paper is not the version on arXiv, rather it is the published peer-reviewed version. The two versions may differ significantly. For example, this is the case with one of the papers that I once discussed in the Text and Multimedia Mining class at Radboud: - [peer-reviewed version](http://opus.bath.ac.uk/55288/4/CaliskanEtAl_authors_full.pdf) - [arXiv version](https://arxiv.org/abs/1608.07187) Compare for yourself. For the selection of the papers above, I choose open access over completeness. If you find another (open) version of a paper, you are invited to make a pull request. ================================================ FILE: task-papers.md ================================================ # Deep Learning Papers by task Papers about deep learning ordered by task, date. Current state-of-the-art papers and papers useful for getting started are labelled. ## Object Recognition * YOLO9000: Better, Faster, Stronger, dec 2016, [arxiv](https://arxiv.org/pdf/1612.08242v1) ![state-of-the-art](https://img.shields.io/badge/style-state_of_the_art-green.svg?style=flat&label=paper) * Feature Pyramid Networks for Object Detection, dec 2016, [arxiv](https://arxiv.org/pdf/1612.03144.pdf) * Densely Connected Convolutional Networks, dec 2016, [arxiv](https://arxiv.org/pdf/1608.06993.pdf) * PVANet: Lightweight Deep Neural Networks for Real-time Object Detection, dec 2016, [arxiv](https://arxiv.org/pdf/1611.08588.pdf) * Aggregated Residual Transformations for Deep Neural Networks, nov 2016, [arxiv](https://arxiv.org/pdf/1611.05431v1.pdf) * Hierarchical Object Detection with Deep Reinforcement Learning, nov 2016, [arxiv](https://arxiv.org/pdf/1611.03718v1.pdf) * Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition, okt 2016, IBM, [paper](https://github.com/sbrugman/deep-learning-papers/blob/master/papers/learning-to-make-better-mistakes-semantics-aware-visual-food-recognition.pdf) * T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos, aug 2016, [github](https://github.com/myfavouritekk/T-CNN), [arxiv](https://arxiv.org/pdf/1604.02532v3) * Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, aug 2016, Google, [arxiv](https://arxiv.org/pdf/1602.07261.pdf) * Residual Networks of Residual Networks: Multilevel Residual Networks, aug 2016, [arxiv](https://arxiv.org/pdf/1608.02908v1) * Context Matters: Refining Object Detection in Video with Recurrent Neural Networks, jul 2016, [arxiv](https://arxiv.org/pdf/1607.04648.pdf) * R-FCN: Object Detection via Region-based Fully Convolutional Networks, jun 2016, [arxiv](https://arxiv.org/pdf/1605.06409.pdf) * Training Region-based Object Detectors with Online Hard Example Mining, apr 2016, Facebook, [arxiv](https://arxiv.org/pdf/1604.03540v1) * Deep Residual Learning for Image Recognition, dec 2015, [arxiv](https://arxiv.org/pdf/1512.03385v1) * SSD: Single Shot MultiBox Detector, dec 2015, Google, [github](https://github.com/weiliu89/caffe/tree/ssd), [arxiv](https://arxiv.org/pdf/1512.02325v2) * ParseNet: Looking Wider to See Better, jun 2015, [arxiv](https://arxiv.org/pdf/1506.04579v2.pdf) * You Only Look Once: Unified, Real-Time Object Detection, jun 2015, Facebook, [arxiv](https://arxiv.org/pdf/1506.02640v5) * Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, jun 2015, Microsoft/Facebook [arxiv](https://arxiv.org/pdf/1506.01497v3.pdf) * Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, feb 2015, [arxiv](https://arxiv.org/pdf/1502.01852v1) * Deep Image: Scaling up Image Recognition, jul 2015, [arxiv](https://arxiv.org/vc/arxiv/papers/1501/1501.02876v1.pdf) * Rich feature hierarchies for accurate object detection and semantic segmentation, 2014, [paper](http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf) * Selective Search for Object Recognition, 2012, [paper](https://pdfs.semanticscholar.org/6a65/f9abad1022c7df2c75b819f48251aac23ae8.pdf) * ImageNet Classification with Deep Convolutional Neural Networks, 2012, [paper](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) ![getting-started](https://img.shields.io/badge/style-getting_started-yellow.svg?style=flat&label=paper) ## Image Segmentation * Mask R-CNN, mar 2017, [arxiv](https://arxiv.org/pdf/1703.06870) * Learning Features by Watching Objects Move, dec 2016, Facebook, [arxiv](https://arxiv.org/pdf/1612.06370.pdf) ![state-of-the-art](https://img.shields.io/badge/style-state_of_the_art-green.svg?style=flat&label=paper) * Fully Convolutional Networks for Semantic Segmentation, may 2016, [arxiv](https://arxiv.org/pdf/1605.06211v1) * Instance-aware Semantic Segmentation via Multi-task Network Cascades, dec 2015, [arxiv](https://arxiv.org/pdf/1512.04412.pdf) * SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, dec 2015, [arxiv](https://arxiv.org/pdf/1511.00561v2.pdf) * Learning Rich Features from RGB-D Images for Object Detection and Segmentation, jul 2014, [arxiv](https://arxiv.org/pdf/1407.5736v1) ## Pose Estimation * Fast Single Shot Detection and Pose Estimation, sep 2016, [arxiv](https://arxiv.org/pdf/1609.05590.pdf) ## Face Recognition * Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition, [paper](https://www.cs.cmu.edu/~sbhagava/papers/face-rec-ccs16.pdf) * OpenFace: A general-purpose face recognition library with mobile applications, June 2016, [paper](http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/2016/CMU-CS-16-118.pdf) * Deep Face Recognition, 2015, [paper](http://www.robots.ox.ac.uk:5000/~vgg/publications/2015/Parkhi15/parkhi15.pdf) * Compact Convolutional Neural Network Cascade for Face Detection, aug 2015, [arxiv](https://arxiv.org/pdf/1508.01292v3) * Learning Robust Deep Face Representation, Jul 2015, [arxiv](https://arxiv.org/pdf/1507.04844.pdf) * FaceNet: A Unified Embedding for Face Recognition and Clustering, jun 2015, [paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Schroff_FaceNet_A_Unified_2015_CVPR_paper.pdf) * Multi-view Face Detection Using Deep Convolutional Neural Networks, yahoo, feb 2015, [arxiv](https://arxiv.org/pdf/1502.02766v3.pdf) ## Style Transfer * Deep Photo Style Transfer, mar 2017, [arxiv](https://arxiv.org/pdf/1703.07511) * A learned representation for artistic style, okt 2016, Google, [arxiv](https://arxiv.org/pdf/1610.07629v1.pdf), [demo](https://magenta.tensorflow.org/2016/11/01/multistyle-pastiche-generator/) * Fast Style Transfer in TensorFlow, [github](https://github.com/lengstrom/fast-style-transfer/) * Instance Normalization: The Missing Ingredient for Fast Stylization, sept 2016, [arxiv](https://arxiv.org/abs/1607.08022) * A Neural Algorithm of Artistic Style, sept 2015, [arxiv](https://arxiv.org/abs/1508.06576) * Perceptual Losses for Real-Time Style Transfer and Super-Resolution, [arxiv](https://arxiv.org/pdf/1603.08155v1), [github](http://github.com/jcjohnson/fast-neural-style) ## Logo Recognition * Deep Learning Logo Detection with Data Expansion by Synthesising Context, jan 2016, [arxiv](https://arxiv.org/pdf/1612.09322.pdf)![state-of-the-art](https://img.shields.io/badge/style-state_of_the_art-green.svg?style=flat&label=paper) * Automatic Graphic Logo Detection via Fast Region-based Convolutional Networks, apr 2016, [arxiv](https://arxiv.org/pdf/1604.06083.pdf) * Logo Localization and Recognition in Natural Images Using Homographic Class Graphs, 2016, [paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.717.9015&rep=rep1&type=pdf) * LOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with Deep Region-based Convolutional Networks, nov 2015, [arxiv](https://arxiv.org/pdf/1511.02462.pdf) * DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer, okt 2015, Berkely, [arxiv](https://arxiv.org/pdf/1510.02131v1.pdf) * Automatic detection of logos in video and their removal using inpainting, jul 2015, [paper](https://github.com/sbrugman/deep-learning-papers/tree/master/papers/automatic-detection-of-logos-in-video-and-their-removal-using-inpainting.pdf) * On the Benefit of Synthetic Data for Company Logo Detection, 2015, [paper](http://www.multimedia-computing.de/mediawiki/images/c/cf/ACMMM2015.pdf) * Fast and Robust Realtime Storefront Logo Recognition, [paper](https://stacks.stanford.edu/file/druid:bf950qp8995/Liu.pdf) * Scalable Logo Recognition in Real-World Images, 2011, [paper](http://www.multimedia-computing.de/mediawiki/images/3/34/ICMR2011_Scalable_Logo_Recognition_in_Real-World_Images.pdf) * https://arxiv.org/pdf/1609.01414v1.pdf _note: also includes some papers that use SIFT_ ## Text (in the Wild) Recognition * COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images, jun 2016, [arxiv](https://arxiv.org/pdf/1601.07140v2.pdf) * Recursive Recurrent Nets with Attention Modeling for OCR in the Wild, mar 2016, [arxiv](https://arxiv.org/pdf/1603.03101v1) ![state-of-the-art](https://img.shields.io/badge/style-state_of_the_art-green.svg?style=flat&label=paper) * OCR Error Correction Using Character Correction and Feature-Based Word Classification, 2016, [arxiv](https://arxiv.org/pdf/1604.06225.pdf) * Efficient Scene Text Localization and Recognition with Local Character Refinement, apr 2015, [arxiv](https://arxiv.org/pdf/1504.03522.pdf) * Reading Text in the Wild with Convolutional Neural Networks, dec 2014, [arxiv](https://arxiv.org/pdf/1412.1842v1.pdf) * Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, jun 2014, [arxiv](https://arxiv.org/pdf/1406.2227v4.pdf) * Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning, 2011, [paper](http://www.cs.stanford.edu/~acoates/papers/coatesetal_icdar_2011.pdf) ## Image / Video Description * Generation and Comprehension of Unambiguous Object Descriptions, apr 2016, [arxiv](https://arxiv.org/pdf/1511.02283v3.pdf) * Long-term Recurrent Convolutional Networks for Visual Recognition and Description, may 2016, [arxiv](https://arxiv.org/pdf/1411.4389v4) ## Detect key actor * Detecting events and key actors in multi-person videos, mar 2015, [arxiv](https://arxiv.org/pdf/1511.02917v2.pdf) ## Self-driving cars * End to End Learning for Self-Driving Cars, apr 2016, NVIDIA, [arxiv](https://arxiv.org/pdf/1604.07316.pdf) ## ConvNet visualization * Visualizing and Understanding Convolutional Networks, nov 2013, [arxiv](https://arxiv.org/pdf/1311.2901.pdf) ## Object part detection * Discovering the physical parts of an articulated object class from multiple videos, 2016, [paper](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Del_Pero_Discovering_the_Physical_CVPR_2016_paper.pdf) ## Pedestrian Detection * Joint Deep Learning for Pedestrian Detection, 2013, [paper](http://www.ee.cuhk.edu.hk/~wlouyang/projects/ouyangWiccv13Joint/material/O&W_ICCV2013.pdf) ## Lip Reading * Lip Reading in the Wild, 2016, Oxford, [paper](https://github.com/sbrugman/deep-learning-papers/tree/master/papers/lip-reading-in-the-wild.pdf) ## Super Resolution * RAISR: Rapid and Accurate Image Super Resolution, okt 2016, Google, [arxiv](https://arxiv.org/pdf/1606.01299v3.pdf) * Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, sep 2016, Twitter, [arxiv](https://arxiv.org/pdf/1609.05158v2) ## Image Compression * Full Resolution Image Compression with Recurrent Neural Networks, aug 2016, Google [arxiv](https://arxiv.org/pdf/1608.05148v1.pdf) ## Automated Theorem Proving * DeepMath - Deep Sequence Models for Premise Selection, jun 2016, Google [arxiv](https://arxiv.org/pdf/1606.04442.pdf) ## Reverse Engineering * Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, okt 2016, [arxiv](https://arxiv.org/pdf/1610.05755v1) * Stealing Machine Learning Models via Prediction APIs, aug 2016, [paper](https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf) ## Language * Rationalizing Neural Predictions [github](https://github.com/taolei87/rcnn), [arxiv](https://arxiv.org/pdf/1606.04155v1) ## Translation * Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, nov 2016, Google, [arxiv](https://arxiv.org/pdf/1611.04558v1.pdf) * Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, okt 2016, Google [arxiv](https://arxiv.org/pdf/1609.08144v2.pdf) ## Age * Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker, dec 2016, [arxiv](https://arxiv.org/pdf/1612.02572v1)