Repository: yflv-yanxia/scene_text Branch: master Commit: bb2f559159ca Files: 1 Total size: 38.0 KB Directory structure: gitextract_7o8twzf0/ └── README.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ # scene_text ## Text Detection [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) -baidu, arxiv2023, [code](https://github.com/lyuwenyu/RT-DETR)
[Real-time Scene Text Detection Based on Global Level and Word Level Features](https://arxiv.org/abs/2203.05251) -arxiv2022
[Kernel Proposal Network for Arbitrary Shape Text Detection](https://arxiv.org/abs/2203.06410) -yinxucheng, TNNLS2022,[code](https://github.com/GXYM/KPN)
[Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304) -baixiang, PAMI2022, [code](https://github.com/MhLiao/DB)
[Towards End-to-End Unified Scene Text Detection and Layout Analysis](https://arxiv.org/abs/2203.15143) -CVPR2022, google, [code](https://github.com/google-research-datasets/hiertext)
[Arbitrary Shape Text Detection using Transformers](https://arxiv.org/abs/2202.11221) -arxiv2022
[Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection](https://arxiv.org/abs/2203.15221) -baixiang, CVPR2022
[Vision-Language Pre-Training for Boosting Scene Text Detectors](https://arxiv.org/abs/2204.13867) -baixiang, CVPR2022
[UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection](https://arxiv.org/abs/2205.04683) -guoyouhui, ICME2022
[Arbitrary Shape Text Detection via Boundary Transformer](https://arxiv.org/abs/2205.05320) -yinxucheng, arxiv2022
[Arbitrary Shape Text Detection via Segmentation with Probability Maps](https://ieeexplore.ieee.org/abstract/document/9779460) -yinxucheng, PAMI2022, [code](https://github.com/GXYM/TextPMs)
[HRRegionNet: Chinese Character Segmentation in Historical Documents with Regional Awareness](https://link.springer.com/chapter/10.1007/978-3-030-86337-1_1) -ICDAR2021
\[Real-Time\][Real-Time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947) -baixiang, AAAI2020, [code](https://github.com/MhLiao/DB)
[Deep relational reasoning graph network for arbitrary shape text detection](https://openaccess.thecvf.com/content_CVPR_2020/html/Zhang_Deep_Relational_Reasoning_Graph_Network_for_Arbitrary_Shape_Text_Detection_CVPR_2020_paper.html) -yinxucheng, CVPR2020, [code](https://github.com/GXYM/DRRG)
[All you need is boundary: Toward arbitrary-shaped text spotting](https://ojs.aaai.org/index.php/AAAI/article/view/6896) -baixiang, AAAI2020
[All you need is a second look: Towards Tighter Arbitrary shape text detection](https://arxiv.org/abs/2004.12436) -arxiv2020
[Self-Training for Domain Adaptive Scene Text Detection](https://arxiv.org/abs/2005.11487) -arxiv2020
[NENET: An Edge Learnable Network for Link Prediction in Scene Text](https://arxiv.org/abs/2005.12147) -arxiv2020
[Efficient Scene Text Detection with Textual Attention Tower](https://arxiv.org/abs/2002.03741) -Liang Zhang, ICASSP2020
[Scale-Invariant Multi-Oriented Text Detection in Wild Scene Images](https://arxiv.org/abs/2002.06423) -Kinjal Dasgupta, arxiv2020
[PuzzleNet: Scene Text Detection by Segment Context Graph Learning](https://arxiv.org/abs/2002.11371) -Hao Liu, arxiv2020
[Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units](https://arxiv.org/abs/2002.11338) -Yu Qiao, arxiv2020
[HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents](https://arxiv.org/abs/2012.05739) -BigData2020, [code](https://github.com/Tverous/HRCenterNet)
[Look more than once: An accurate detector for text of arbitrary shapes](https://openaccess.thecvf.com/content_CVPR_2019/html/Zhang_Look_More_Than_Once_An_Accurate_Detector_for_Text_of_CVPR_2019_paper.html) -baidu, CVPR2019
[Gliding vertex on the horizontal bounding box for multi-oriented object detection](https://arxiv.org/abs/1911.09358) -Xiang Bai, arxiv2019[code](https://github.com/MingtaoFu/gliding_vertex)
[Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection](https://arxiv.org/abs/1912.09629) -jinlianwen, arxiv2019
[Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network](https://arxiv.org/abs/1908.05900) -face++, ICCV 2019, [code](https://github.com/whai362/pan_pp.pytorch)
[A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning](https://arxiv.org/abs/1908.05498) -Pengfei Wang, arxiv2019
[It's All About The Scale -- Efficient Text Detection Using Adaptive Scaling](https://arxiv.org/abs/1907.12122) -Elad Richardson, arxiv2019
[FaSTExt: Fast and Small Text Extractor](https://arxiv.org/abs/1908.09231) -Alexander Filonenko, arxiv2019
[Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning](https://arxiv.org/abs/1908.09990) -Xugong Qin, arxiv2019
[Learning Shape-Aware Embedding for Scene Text Detection](http://jiaya.me/papers/textdetection_cvpr19.pdf) -CUHK, Tencent, CVPR2019
[Shape Robust Text Detection with Progressive Scale Expansion Network](https://arxiv.org/abs/1903.12473) -megi++, CVPR2019
[Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation](https://arxiv.org/abs/1905.05980) -Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, Sungjin Kim, CVPR2019
[Character Region Awareness for Text Detection](https://arxiv.org/abs/1904.01941) -Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee, CVPR2019
[Towards Robust Curve Text Detection with Conditional Spatial Expansion](https://arxiv.org/abs/1903.08836) -Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, Wang Ling Goh, CVPR2019
[Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800) -sensetime, arxiv2019
[Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes](https://arxiv.org/abs/1904.06535) -baidu, CVPR2019
[Character Region Awareness for Text Detection](https://arxiv.org/pdf/1904.01941.pdf) -Clova, CVPR2019
[Detecting Text in the Wild with Deep Character Embedding Network](https://arxiv.org/abs/1901.00363) -baidu, arxiv2019
[TextField: Learning A Deep Direction Field for Irregular Scene Text Detection](https://arxiv.org/abs/1812.01393) -Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai, arxiv2018
[TextMountain: Accurate Scene Text Detection via Instance Segmentation](https://arxiv.org/abs/1811.12786) -Yixing Zhu, Jun Du, arxiv2018
[Mask R-CNN with Pyramid Attention Network for Scene Text Detection](https://arxiv.org/abs/1811.09058) -MSRA, arxiv2018
[Scene Text Detection with Supervised Pyramid Context Network](https://arxiv.org/abs/1811.08605) -face++, AAAI2019
[Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks](https://arxiv.org/abs/1811.07432) -cloudwalk, arxiv2018
[Improving Rotated Text Detection with Rotation Region Proposal Networks](https://arxiv.org/abs/1811.07031) -facebook, arxiv2018
[IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection](https://arxiv.org/abs/1805.01167) -Alibaba, IJCAI2018
[TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes](https://arxiv.org/abs/1807.01544) -peking, face++, arxiv2018
[PSENET: Shape Robust Text Detection with Progressive Scale Expansion Network](https://arxiv.org/abs/1806.02559) -deepinsight, CVPR2019
[Arbitrary-Oriented Scene Text Detection via Rotation Proposals](https://ieeexplore.ieee.org/abstract/document/8323240/) -J Ma, W Shao, H Ye, L Wang, H Wang, TMM2018
[TextBoxes++: A Single-Shot Oriented Scene Text Detector](https://arxiv.org/abs/1801.02765) -Minghui Liao, Baoguang Shi, Xiang Bai, arxiv2018 [code](https://github.com/MhLiao/TextBoxes_plusplus)
[Dense and Tight Detection of Chinese Characters in Historical Documents: Datasets and a Recognition Guided Detector](https://ieeexplore.ieee.org/document/8364534) -JinLianwen, IEEEaccess2018
[R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection](https://arxiv.org/abs/1706.09579) -Samsung, arxiv2018
[Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation](https://arxiv.org/abs/1802.08948) -Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai, arxiv2018
[PixelLink: Detecting Scene Text via Instance Segmentation](https://arxiv.org/abs/1801.01315) -Dan Deng, Haifeng Liu, Xuelong Li, Deng Cai, aaai2018
[EAST: an efficient and accurate scene text detector](http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhou_EAST_An_Efficient_CVPR_2017_paper.pdf) -Megvii, cvpr2017, [code](https://github.com/argman/EAST)
[Scene text detection and segmentation based on cascaded convolution neural networks](https://ieeexplore.ieee.org/abstract/document/7828014/) -Y Tang, X Wu, TIP2017
[TextBoxes: A Fast Text Detector with a Single Deep Neural Network.](http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14202/14295) -M Liao, B Shi, X Bai, X Wang, W Liu, AAAI2017, [code](https://github.com/MhLiao/TextBoxes)
[Deep direct regression for multi-oriented scene text detection](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Deep_Direct_Regression_ICCV_2017_paper.pdf) -W He, XY Zhang, F Yin, CL Liu, ICCV2017
[Detecting oriented text in natural images by linking segments](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shi_Detecting_Oriented_Text_CVPR_2017_paper.pdf) -B Shi, X Bai, S Belongie, CVPR2017, [code](https://github.com/bgshih/seglink)
[Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection](https://arxiv.org/pdf/1703.01425.pdf) -Yuliang Liu, Lianwen Jin, CVPR2017
[Feature Enhancement Network: A Refined Scene Text Detector](https://arxiv.org/abs/1711.04249) -Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo, arxiv2017
[Single Shot Text Detector with Regional Attention](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Single_Shot_Text_ICCV_2017_paper.pdf) -Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li, ICCV2017
[A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling](https://ieeexplore.ieee.org/abstract/document/7733055/) -Xiaohang Ren, Yi Zhou, Jianhua He, Kai Chen, Xiaokang Yang, Jun Sun, TMM2017
[Fused Text Segmentation Networks for Multi-oriented Scene Text Detection](https://arxiv.org/abs/1709.03272) -Yuchen Dai, et al, arxiv2017
[Scene Text Detection with Novel Superpixel Based Character Candidate Extraction](https://ieeexplore.ieee.org/abstract/document/8270087/) -Cong Wang, Fei Yin, Cheng-Lin Liu, ICDAR2017
[WeText: Scene Text Detection under Weak Supervision](https://arxiv.org/abs/1710.04826) -Shangxuan Tian, Shijian Lu, Chongshou Li, ICCV2017
[WordSup: Exploiting Word Annotations for Character based Text Detection](https://arxiv.org/pdf/1708.06720.pdf) -MSRA, IDL, ICCV2017
[Deep Residual Text Detection Network for Scene Text](https://arxiv.org/abs/1711.04147) -Xiangyu Zhu, et al, arxiv2017
[Cascaded Segmentation-Detection Networks for Word-Level Text Spotting](https://arxiv.org/abs/1704.00834) -Siyang Qin, Roberto Manduchi, arxiv2017
[Arbitrary-Oriented Scene Text Detection via Rotation Proposals](https://arxiv.org/pdf/1703.01086.pdf) -Jianqi Ma, et al, TMM2017
[Multi-oriented text detection with fully convolutional networks](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhang_Multi-Oriented_Text_Detection_CVPR_2016_paper.pdf) -Z Zhang, C Zhang, W Shen, C Yao, CVPR2016
[Scene text detection via holistic, multi-channel prediction](https://arxiv.org/abs/1606.09002) -C Yao, X Bai, N Sang, X Zhou, S Zhou, arxiv2016
## Text Recognition [Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer](https://arxiv.org/abs/2311.13120) -bytedance, CVPR2024, [code](https://github.com/bytedance/E2STR)
[Revisiting Scene Text Recognition: A Data Perspective](https://openaccess.thecvf.com/content/ICCV2023/html/Jiang_Revisiting_Scene_Text_Recognition_A_Data_Perspective_ICCV_2023_paper.html) -jinlianwen, ICCV2023, [code](https://github.com/Mountchicken/Union14M)
[Context Perception Parallel Decoder for Scene Text Recognition](https://arxiv.org/abs/2307.12270) -baidu,arxiv2023
[Cdistnet: Perceiving multi-domain character distance for robust text recognition](https://arxiv.org/pdf/2111.11011) -fudan, IJCV2023, [code](https://github.com/simplify23/CDistNet)
[Trocr: Transformer-based optical character recognition with pre-trained models](https://ojs.aaai.org/index.php/AAAI/article/view/26538) -Microsoft, AAAI2023, [code](https://aka.ms/trocr)
[Context-Based Contrastive Learning for Scene Text Recognition](https://aaai-2022.virtualchair.net/poster_aaai10147) -AAAI2022
[SVTR: Scene Text Recognition with a Single Visual Model](https://arxiv.org/abs/2205.00159) -baidu, IJCAI2022, [code]()
[Multi-modal Text Recognition Networks: Interactive enhancements between visual and semantic features](https://arxiv.org/abs/2111.15263) -ECCV2022
[Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition](https://arxiv.org/abs/2105.06229v2) -hikvision, ICDAR2021, [code](https://github.com/hikopensource/DAVAR-Lab-OCR/tree/main/demo/text_recognition/rflearning)
[Dictionary-Guided Scene Text Recognition](https://openaccess.thecvf.com/content/CVPR2021/html/Nguyen_Dictionary-Guided_Scene_Text_Recognition_CVPR_2021_paper.html) -CVPR2021,[code](https://github.com/VinAIResearch/dict-guided)
[TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) -beihang, arxiv2021, [code](https://aka.ms/TrOCR)
[RecycleNet: An Overlapped Text Instance Recovery Approach](https://dl.acm.org/doi/abs/10.1145/3474085.3481536) -tencent, MMM21
[Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/abs/2105.08582) Rowel-ICDAR2021
[Visual-semantic transformer for scene text recognitio](https://arxiv.org/abs/2112.00948)-pingan, arxiv2021
[PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network](https://arxiv.org/abs/2104.05458) -baidu, AAAI2021, [code](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/algorithm_e2e_pgnet.md)
[What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels](https://arxiv.org/abs/2103.04400) -tokyo, CVPR2021, [code](https://github.com/ku21fan/STR-Fewer-Labels)
[Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495) -ShanchengFang, CVPR2021, [code](https://github.com/FangShancheng/ABINet)
\[light\][Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition](https://arxiv.org/abs/2009.10874) --pingan, arxiv2020
[Gaussian Constrained Attention Network for Scene Text Recognition](https://arxiv.org/abs/2010.09169) -qiaozhi, ICPR2020, [code](https://github.com/Pay20Y/GCAN)
[Adaptive Text Recognition through Visual Matching](https://arxiv.org/abs/2009.06610) -zisserman, ECCV2020
[On Vocabulary Reliance in Scene Text Recognition](https://arxiv.org/abs/2005.03959) -megvii, CVPR2020
[Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization](https://arxiv.org/abs/2007.06890) -JinLianwen, ICFHR2020
[Text Recognition in Real Scenarios with a Few Labeled Samples](https://arxiv.org/abs/2006.12209) -arxiv2020
[RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/abs/2007.07542) -ECCV2020
[On Recognizing Texts of Arbitrary Shapes With 2D Self-Attention](https://openaccess.thecvf.com/content_CVPRW_2020/html/w34/Lee_On_Recognizing_Texts_of_Arbitrary_Shapes_With_2D_Self-Attention_CVPRW_2020_paper.html) -CVPRW2020
[SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition](https://arxiv.org/abs/2005.10977) -zhi qiao, CVPR2020
[Text Recognition in the Wild: A Survey](https://arxiv.org/abs/2005.03492) -jinlianwen, arxiv2020
[GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition](https://arxiv.org/abs/2002.01276) -Wenyang Hu, AAAI2020
[A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling](https://arxiv.org/abs/2002.03509) -yao cong, ICASSP2020
[SCATTER: Selective Context Attentional Scene Text Recognizer](https://arxiv.org/abs/2003.11288) -Ron Litman, CVPR2020
[Scene Text Recognition via Transformer](https://arxiv.org/abs/2003.08077) -Xinjie Feng, arxiv2020
[Efficient Backbone Search for Scene Text Recognition](https://arxiv.org/abs/2003.06567) -baixiang, arxiv2020
[Towards Accurate Scene Text Recognition with Semantic Reasoning Networks](https://arxiv.org/abs/2003.12294v1) -Baidu, CVPR2020
[Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition](https://arxiv.org/abs/2003.06606) -jinlianwen, CVPR2020, [code](https://github.com/Canjie-Luo/Text-Image-Augmentation)
[Decoupled Attention Network for Text Recognition](https://arxiv.org/abs/1912.10205) -jianlianwen, AAAI2020
[Fast Dense Residual Network: Enhancing Global Dense Feature Flow for Text Recognition](https://arxiv.org/abs/2001.09021) -Zhao Zhang, arxiv2020
[Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild](https://arxiv.org/abs/2001.04189) -jin lianwen, arxiv2020
[TextScanner: Reading Characters in Order for Robust Scene Text Recognition](https://arxiv.org/abs/1912.12422) -yao cong, AAAI2020
[What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis](https://arxiv.org/abs/1904.01906) clova-ICCV2019,[code](https://github.com/clovaai/deep-text-recognition-benchmark)
[A Feasible Framework for Arbitrary-Shaped Scene Text Recognition](https://arxiv.org/abs/1912.04561) -champion in ICDAR2019, arxiv2019[code](https://github.com/zhang0jhon/AttentionOCR)
[Deep Neural Network for Semantic-based Text Recognition in Images](https://arxiv.org/abs/1908.01403) -Yi Zheng, arxiv2019
[Symmetry-constrained Rectification Network for Scene Text Recognition](https://arxiv.org/abs/1908.01957) -baixiang, ICCV2019
[Adaptive Embedding Gate for Attention-Based Scene Text Recognition](https://arxiv.org/abs/1908.09475) -Linwen Jin, arxiv2019
[Focus-Enhanced Scene Text Recognition with Deformable Convolutions](https://arxiv.org/abs/1908.10998) -Yanxiang Gong, arxiv2019, [code](https://github.com/Alpaca07/dtr)
[Rethinking Irregular Scene Text Recognition](https://arxiv.org/abs/1908.11834) -yao cong, ICDAR19 art champion, [code](https://github.com/Jyouhou/ICDAR2019-ArT-Recognition-Alchemy)
[Aggregation Cross-Entropy for Sequence Recognition](https://arxiv.org/abs/1904.08364) -Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie, CVPR2019,[code](https://github.com/summerlvsong/Aggregation-Cross-Entropy)
[Sequence-to-Sequence Domain Adaptation Networkfor Robust Text Image Recognition](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Sequence-To-Sequence_Domain_Adaptation_Network_for_Robust_Text_Image_Recognition_CVPR_2019_paper.pdf), CASIA, CVPR2019
[Towards End-to-End Text Spotting in Natural Scenes](https://arxiv.org/abs/1906.06013) -LiHui, et al, arxiv2019
[2D Attentional Irregular Scene Text Recognizer](https://arxiv.org/abs/1906.05708) -Tencent, arxiv2019
[ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification](https://arxiv.org/abs/1812.05824) -Fangneng Zhan, Shijian Lu, CVPR2019
[FACLSTM: ConvLSTM with Focused Attention for Scene Text Recognition](https://arxiv.org/abs/1904.09405) -Qingqing Wang, et al, arxiv2019
[A Multi-Object Rectified Attention Network for Scene Text Recognition](https://arxiv.org/abs/1901.03003) -Canjie Luo, Lianwen Jin, Zenghui Sun, PR2019
[Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751) -Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang, AAAI2019, [code](https://tinyurl.com/ShowAttendRead)
[Scene Text Recognition from Two-Dimensional Perspective](https://arxiv.org/abs/1809.06508) -Minghui Liao, Cong Yao, Xiang Bai, et al, AAAI2019
[Recurrent Calibration Network for Irregular Text Recognition](https://arxiv.org/abs/1812.07145) -Hanqing Lu, arxiv2018
[ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification](https://arxiv.org/abs/1812.05824) -Fangneng Zhan, Shijian Lu, arxiv2018
[Synthetically Supervised Feature Learning for Scene Text Recognition](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yang_Liu_Synthetically_Supervised_Feature_ECCV_2018_paper.pdf) -Adobe, ECCV2018
[Connectionist Temporal Classification with Maximum Entropy Regularization](https://proceedings.neurips.cc/paper/2018/hash/e44fea3bec53bcea3b7513ccef5857ac-Abstract.html) -Tsinghua, NeurIPS2018,[code](https://github.com/liuhu-bigeye/enctc.crnn)
[ASTER: An Attentional Scene Text Recognizer with Flexible Rectification](https://ieeexplore.ieee.org/document/8395027) -Baixiang, PAMI2018, [code](https://github.com/ayumiymk/aster.pytorch)
[Edit Probability for Scene Text Recognition](http://openaccess.thecvf.com/content_cvpr_2018/papers/Bai_Edit_Probability_for_CVPR_2018_paper.pdf) -Fudan, Hikvision, cvpr2018
[SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network](https://pdfs.semanticscholar.org/0e59/f7d7e9c9380b425a94038c7a2500b2f6063a.pdf) -Zichuan Liu, et al, AAAI2018
[State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines](https://arxiv.org/abs/1810.03436) -arxiv2018
[SCAN: Sliding Convolutional Attention Network for Scene Text Recognition](https://arxiv.org/abs/1806.00578) -Yichao Wu, et al, arxiv2018
[NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition](https://arxiv.org/abs/1806.00926) -Fenfen Sheng, et al, arxiv2018
[AON: Towards Arbitrarily-Oriented Text Recognition](https://arxiv.org/abs/1711.04226) -Hikvision, et al, CVPR2018
[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://ieeexplore.ieee.org/abstract/document/7801919/) -B Shi, X Bai, C Yao , TPAMI2017 [code](https://github.com/bgshih/crnn)
[Scene Text Recognition with Sliding Convolutional Character Models](https://arxiv.org/abs/1709.01727) -fei yin, et al, arxiv2017
[Focusing Attention: Towards Accurate Text Recognition in Natural Images](http://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_Focusing_Attention_Towards_ICCV_2017_paper.pdf) -Hikvision, et al, ICCV2017
[AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition](https://pdfs.semanticscholar.org/2111/d546ac1cbf170302e44a17c88d26b1c55999.pdf) -Chun Yang, Xu-Cheng Yin, arxiv2017
[Strokelets: A learned multi-scale mid-level representation for scene text recognition](https://ieeexplore.ieee.org/abstract/document/7453176/) -X Bai, C Yao, W Liu , TIP2016
[Reading Scene Text in Deep Convolutional Sequences](http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12256/12121) -P He, W Huang, Y Qiao, CC Loy, X Tang, AAAI2016
[Text-Attentional Convolutional Neural Network for Scene Text Detection](http://ieeexplore.ieee.org/abstract/document/7442550/) -Tong He, Weilin Huang, Yu Qiao, Jian Yao, TIP2016
[Robust Scene Text Recognition with Automatic Rectification](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shi_Robust_Scene_Text_CVPR_2016_paper.pdf) -Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai, CVPR2016
[DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images](https://arxiv.org/abs/1605.07314) -Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng, arxiv2016
[Recursive Recurrent Nets with Attention Modeling for OCR in the Wild](https://arxiv.org/pdf/1603.03101v1.pdf) -Yahoo, CVPR2016
## End-to-End & Text Spotting [ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting](https://arxiv.org/abs/2211.10578) -ustc, PAMI2023,[code](https://github.com/FangShancheng/ABINet-PP)
[Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting](https://arxiv.org/abs/2203.03911) -bytedance, arxiv2022
[DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting](https://arxiv.org/abs/2203.05122) -naver, arxiv2022, [code]()
[SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition](https://arxiv.org/abs/2203.10209) -jinlianwen, CVPR2022, [code](https://github.com/mxin262/SwinTextSpotter)
[End-to-End Video Text Spotting with Transformer](https://arxiv.org/abs/2203.10539) -shenchunhua, arxiv2022, [code](https://github.com/weijiawu/TransDETR)
[Text Spotting Transformers](https://arxiv.org/abs/2204.01918) -intel, CVPR2022
[PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System](https://arxiv.org/abs/2206.03001) -baidu, arxiv2022
\[light\][PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System](https://arxiv.org/abs/2109.03144) -baidu, arxiv2021, [code](https://github.com/PaddlePaddle/PaddleOCR)
[icdar competition][1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit](https://arxiv.org/abs/2104.03544) -hikvision, arxiv2021
[ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting](https://arxiv.org/abs/2105.03620) -jinlianwen, arxiv2021, [code](https://github.com/aim-uofa/AdelaiDet/)
\[light\][PP-OCR: A Practical Ultra Lightweight OCR System](https://arxiv.org/abs/2009.09941) -baidu, arxiv2020, [code](https://github.com/PaddlePaddle/PaddleOCR)
[Character Region Attention For Text Spotting](https://arxiv.org/abs/2007.09629) -ECCV2020
[Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting](https://arxiv.org/abs/2007.09482) -baixiang, ECCV2020, [code]()
[Text Detection and Recognition in the Wild: A Review](https://arxiv.org/abs/2006.04305) -arxiv2020
[Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting](https://arxiv.org/abs/2002.06820) -Liang Qiao, AAAI2020
[All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting](https://arxiv.org/abs/1911.09550) -baixiang, AAAI2020
[ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network](https://arxiv.org/abs/2002.10200) -jin lianwen, CVPR2020
[Convolutional Character Networks](https://arxiv.org/abs/1910.07954) -Linjie Xing, Zhi Tian, Weilin Huang, Matthew R. Scott, ICCV2019
[TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting](http://openaccess.thecvf.com/content_ICCV_2019/papers/Feng_TextDragon_An_End-to-End_Framework_for_Arbitrary_Shaped_Text_Spotting_ICCV_2019_paper.pdf) -Chenglin Liu, CVPR2019
[Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes](https://arxiv.org/abs/1908.08207) -baixiang, TPAMI2019
[Towards Unconstrained End-to-End Text Spotting](https://arxiv.org/abs/1908.09231) -google ai, arxiv2019
[Towards End-to-End Text Spotting in Natural Scenes](https://arxiv.org/abs/1906.06013) -Hui Li, Peng Wang, Chunhua Shen, arxiv2019
[Weakly supervised precise segmentation for historical document images](https://www.sciencedirect.com/science/article/pii/S0925231219304989) -JIn Lianwen, Neurocomputing2019
[A Novel Integrated Framework for Learning both Text Detection and Recognition](https://arxiv.org/abs/1811.08611) -alibaba, arxiv2018
[TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network](https://arxiv.org/abs/1812.09900) -baidu, arxiv2018
[Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes](https://arxiv.org/abs/1807.02242) -Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai, arxiv2018
[FOTS: Fast Oriented Text Spotting with a Unified Network](https://arxiv.org/abs/1801.01671) -Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, Junjie Yan, CVPR2018
[E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text](https://arxiv.org/abs/1801.09919) -Yash Patel, et al, arxiv2018
[SEE: Towards Semi-Supervised End-to-End Scene Text Recognition](http://arxiv.org/abs/1712.05404) -Christian Bartz, Haojin Yang, Christoph Meinel, AAAI2018
[An end-to-end TextSpotter with Explicit Alignment and Attention](https://arxiv.org/abs/1803.03474) -Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun, CVPR2018
[Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks](http://openaccess.thecvf.com/content_ICCV_2017/papers/Li_Towards_End-To-End_Text_ICCV_2017_paper.pdf) -Hui Li, et al, ICCV2017
[Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework](http://openaccess.thecvf.com/content_ICCV_2017/papers/Busta_Deep_TextSpotter_An_ICCV_2017_paper.pdf) -Michal Busta, et al, ICCV2017, [code](https://github.com/MichalBusta/DeepTextSpotter)
[Reading Text in the Wild with Convolutional Neural Networks](https://link.springer.com/article/10.1007%2Fs11263-015-0823-z) -Max Jaderberg, et al, IJCV2016
## Text Retrieval [Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers](https://arxiv.org/abs/2103.16553) -zisserman, CVPR2021
[Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval](https://arxiv.org/abs/2104.00650) -zisserman, arxiv2021
[Scene Text Retrieval via Joint Text Detection and Similarity Learning](https://arxiv.org/abs/2104.01552) -baixiang, CVPR2021, [code/CSVTR database](https://github.com/lanfeng4659/STR-TDSL)
### Synthesis [https://github.com/clovaai/synthtiger](synthtiger)
[Editing Text in the Wild](https://arxiv.org/abs/1908.03047) -baidu, ACM MM 2019
[Data Augmentation for Scene Text Recognition](https://arxiv.org/abs/2108.06949) -ICCV2021 workshop, [code](https://github.com/roatienza/straug)
[text_renderer](https://github.com/oh-my-ocr/text_renderer)
[SynthText](https://github.com/ankush-me/SynthText)
[SynthText](https://github.com/ankush-me/SynthText)
[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)
[UnrealText](https://github.com/Jyouhou/UnrealText/)
[ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation](https://arxiv.org/abs/2003.10557) -Sharon Fogel, CVPR2020
[SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds](https://arxiv.org/abs/1907.06007) -Minghui Liao, Boyu Song, Minghang He, Shangbang Long, Cong Yao, Xiang Bai, arxiv2019[code](https://github.com/MhLiao/SynthText3D)
[Spatial Fusion GAN for Image Synthesis](https://arxiv.org/abs/1812.05840) -Fangneng Zhan, Hongyuan Zhu, Shijian Lu, CVPR2019, [code](https://github.com/Sunshine352/SF-GAN)
[Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes](https://arxiv.org/abs/1807.03021) -Fangneng Zhan, Shijian Lu, Chuhui Xue, ECCV2018
### Evaluation [CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks](https://arxiv.org/abs/2006.06244) -arxiv2020
[End-To-End Measure for Text Recognition](https://arxiv.org/abs/1908.09584) -ICDAR2019
[Tightness-aware Evaluation Protocol for Scene Text Detection](https://arxiv.org/abs/1904.00813) -jinlianwen, CVPR2019
### Script identification [Patch Aggregator for Scene Text Script Identification](https://arxiv.org/abs/1912.03818) --baixiang, arxiv2019
### Super Resolution [Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution](https://ojs.aaai.org/index.php/AAAI/article/view/19904) -fudan, AAAI2022, [code](https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt)
[Restormer: Efficient Transformer for High-Resolution Image Restoration](https://arxiv.org/abs/2111.09881) google, CVPR2022, [code](https://github.com/swz30/Restormer)
[Scene Text Telescope: Text-Focused Scene Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.html?ref=https://githubhelp.com) -fudan, CVPR2021
[Text Prior Guided Scene Text Image Super-resolution](https://arxiv.org/abs/2106.15368) -arxiv2021, [code](https://github.com/mjq11302010044/TPGSR)
[Scene Text Image Super-Resolution in the Wild](https://arxiv.org/abs/2005.03341) -baixiang, ECCV2020
## Other [AnyText: Multilingual Visual Text Generation And Editing](https://arxiv.org/abs/2311.03054) -alibaba, arxiv2023, [code](https://github.com/tyxsspa/AnyText)
[Stroke-Based Scene Text Erasing Using Synthetic Data for Training](https://ieeexplore.ieee.org/abstract/document/9609970) -TIP2021
[Page Layout Analysis System for Unconstrained Historic Documents](https://arxiv.org/abs/2102.11838) -ICDAR2021
[EraseNet: End-to-End Text Removal in the Wild](https://ieeexplore.ieee.org/document/9180003) -Jinlianwen, TIP2020, [code](https://github.com/HCIILAB/SCUT-EnsText)
[SwapText: Image Based Texts Transfer in Scenes](https://arxiv.org/abs/2003.08152) -Qiangpeng Yang, CVPR2020
[UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World](https://arxiv.org/abs/2003.10608) -Cong Yao, CVPR2020
[EnsNet: Ensconce Text in the Wild](https://ojs.aaai.org/index.php/AAAI/article/view/3859) -JinLianwen, AAAI2019, [code](https://github.com/HCIILAB/SCUT-EnsText)
[TextSR: Content-Aware Text Super-Resolution Guided by Recognition](https://arxiv.org/abs/1909.07113) -forevision, arxiv2019
[Editing Text in the Wild](https://arxiv.org/abs/1908.03047) -baixiang, ACM MM2019
[MTRNet: A Generic Scene Text Eraser](https://ieeexplore.ieee.org/abstract/document/8978083) -ICDAR2019
[Scene Text Detection and Recognition: The Deep Learning Era](https://arxiv.org/abs/1811.04256) -face++, arxiv2018
[Text/non-text image classification in the wild with convolutional neural networks](https://www.sciencedirect.com/science/article/pii/S0031320316303922) -X Bai, B Shi, C Zhang, X Cai, L Qi, PR2017
[Scene text script identification with convolutional recurrent neural networks](http://ieeexplore.ieee.org/abstract/document/7900268/) -J Mei, L Dai, B Shi, X Bai, ICPR2016
## Seq2Seq [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) -FAIR, ICML2017
[Sequence Level Training with Recurrent Neural Networks](https://arxiv.org/abs/1511.06732) -FAIR, ICLR2016
[A Convolutional Encoder Model for Neural Machine Translation](https://arxiv.org/abs/1611.02344) -FAIR, arxiv2016
## Reading Order [LayoutReader: Pre-training of Text and Layout for Reading Order Detection](https://arxiv.org/abs/2108.11591) -MSRA, EMNLP2021, [code/database](https://github.com/microsoft/unilm/tree/master/layoutreader)
## Database & Generation ### chinese [TRW15: ICDAR 2015 Text Reading in the Wild Competition](https://arxiv.org/abs/1506.03184)
RCTW-17: ICDAR2017-Reading Chinese Text in the Wild
STV2k: A New Benchmark for Scene Text Detection and Recognition
[CTW: Chinese Text in the Wild](https://ctwdataset.github.io/)
PAL10K
[COCO TS Dataset](https://arxiv.org/abs/1904.00818)
[ICPR MTWI 2018 挑战赛一:网络图像的文本识别](https://tianchi.aliyun.com/competition/information.htm?raceId=231650)
### other [Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition](https://arxiv.org/abs/2203.12165)
[Textual Visual Semantic Dataset for Text Spotting](https://github.com/HCIILAB/dataset)
[RoadText-1K: Text Detection & Recognition Dataset for Driving Videos](https://arxiv.org/abs/2005.09496) [DDI-100: Dataset for Text Detection and Recognition](https://arxiv.org/abs/1912.11658) [Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes](https://arxiv.org/pdf/1807.03021.pdf) -Fangneng Zhan, Shijian Lu, and Chuhui Xue, arxiv2018
[Total-Text](https://github.com/cs-chan/Total-Text-Dataset) -1555 images
[SCUT-CTW1500](https://github.com/Yuliang-Liu/Curve-Text-Detector) -Curved text in the wild
[MLT: Multi-lingual scene text detection and script identification](http://rrc.cvc.uab.es/?ch=8) -Multi-lingual text: 18,000 images, 9 different languages representing 6 different scripts
[Synthetic Word Dataset](http://www.robots.ox.ac.uk/~vgg/data/text/), [Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition](https://arxiv.org/abs/1406.2227)
[Total-text: A comprehensive dataset for scene text detection and recognition](https://ieeexplore.ieee.org/abstract/document/8270088/) - -Chee Kheng Ch'ng, Chee Seng Chan
Street View Text(SVT)
IIIT 5k-words
MSRA-TD500
KAIST Scene_Text Database
ICDAR2011, ICDAR2013, ICDAR2015, ICDAR2017, robust reading-Focused Scene Text
ICDAR2017-ICDAR 2017 Robust Reading Challenge on Omnidirectional Video(DOST)
COCO-Text
Google French Street Name Signs (FSNS) dataset
ICDAR2017-ICDAR2017 Competition on Multi-lingual scene text detection and script identification(MLT)
ICDAR2017-Born-Digital Images (Web and Email)
[Detecting Curve Text in the Wild: New Dataset and New Solution](https://arxiv.org/abs/1712.02170)
[Synthetic Word](http://www.robots.ox.ac.uk/~vgg/data/text/)
[Synthetic Data for Text Localisation in Natural Images](https://www.cv-foundation.org/openaccess/content_cvpr_2016/app/S10-06.pdf) -Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR2016
### vietnamese [VinText](https://github.com/VinAIResearch/dict-guided)
## Competition [ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)](https://arxiv.org/abs/1708.09585) -B Shi, C Yao, M Liao, M Yang, P Xu, L Cui, arxiv2017
[ICDAR 2015 competition on robust reading](https://ieeexplore.ieee.org/abstract/document/7333942/)
[Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4](https://arxiv.org/abs/1511.09207) -Cong Yao, Jianan Wu, Xinyu Zhou, Chi Zhang, Shuchang Zhou, Zhimin Cao, Qi Yin
## Link [awesome-deep-text-detection-recognition](https://github.com/hwalsuklee/awesome-deep-text-detection-recognition)
[Awesome-Scene-Text-Recognition](https://github.com/chongyangtao/Awesome-Scene-Text-Recognition)
[Scene Text Detection](https://paperswithcode.com/task/scene-text-detection/codeless)