[
  {
    "path": "README.md",
    "content": "# scene_text\n\n## Text Detection\n[DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) -baidu, arxiv2023, [code](https://github.com/lyuwenyu/RT-DETR)<br>\n[Real-time Scene Text Detection Based on Global Level and Word Level Features](https://arxiv.org/abs/2203.05251) -arxiv2022<br>\n[Kernel Proposal Network for Arbitrary Shape Text Detection](https://arxiv.org/abs/2203.06410) -yinxucheng, TNNLS2022,[code](https://github.com/GXYM/KPN)<br>\n[Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304) -baixiang, PAMI2022, [code](https://github.com/MhLiao/DB)<br>\n[Towards End-to-End Unified Scene Text Detection and Layout Analysis](https://arxiv.org/abs/2203.15143) -CVPR2022, google, [code](https://github.com/google-research-datasets/hiertext)<br>\n[Arbitrary Shape Text Detection using Transformers](https://arxiv.org/abs/2202.11221) -arxiv2022<br>\n[Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection](https://arxiv.org/abs/2203.15221) -baixiang, CVPR2022<br>\n[Vision-Language Pre-Training for Boosting Scene Text Detectors](https://arxiv.org/abs/2204.13867) -baixiang, CVPR2022<br>\n[UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection](https://arxiv.org/abs/2205.04683) -guoyouhui, ICME2022<br>\n[Arbitrary Shape Text Detection via Boundary Transformer](https://arxiv.org/abs/2205.05320) -yinxucheng, arxiv2022<br>\n[Arbitrary Shape Text Detection via Segmentation with Probability Maps](https://ieeexplore.ieee.org/abstract/document/9779460) -yinxucheng, PAMI2022, [code](https://github.com/GXYM/TextPMs)<br>\n[HRRegionNet: Chinese Character Segmentation in Historical Documents with Regional Awareness](https://link.springer.com/chapter/10.1007/978-3-030-86337-1_1) -ICDAR2021<br>\n\\[Real-Time\\][Real-Time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947) -baixiang, AAAI2020, [code](https://github.com/MhLiao/DB)<br>\n[Deep relational reasoning graph network for arbitrary shape text detection](https://openaccess.thecvf.com/content_CVPR_2020/html/Zhang_Deep_Relational_Reasoning_Graph_Network_for_Arbitrary_Shape_Text_Detection_CVPR_2020_paper.html) -yinxucheng, CVPR2020, [code](https://github.com/GXYM/DRRG)<br>\n[All you need is boundary: Toward arbitrary-shaped text spotting](https://ojs.aaai.org/index.php/AAAI/article/view/6896) -baixiang, AAAI2020<br>\n[All you need is a second look: Towards Tighter Arbitrary shape text detection](https://arxiv.org/abs/2004.12436) -arxiv2020<br>\n[Self-Training for Domain Adaptive Scene Text Detection](https://arxiv.org/abs/2005.11487) -arxiv2020<br>\n[NENET: An Edge Learnable Network for Link Prediction in Scene Text](https://arxiv.org/abs/2005.12147) -arxiv2020<br>\n[Efficient Scene Text Detection with Textual Attention Tower](https://arxiv.org/abs/2002.03741) -Liang Zhang, ICASSP2020<br>\n[Scale-Invariant Multi-Oriented Text Detection in Wild Scene Images](https://arxiv.org/abs/2002.06423) -Kinjal Dasgupta, arxiv2020<br>\n[PuzzleNet: Scene Text Detection by Segment Context Graph Learning](https://arxiv.org/abs/2002.11371) -Hao Liu, arxiv2020<br>\n[Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units](https://arxiv.org/abs/2002.11338) -Yu Qiao, arxiv2020<br>\n[HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents](https://arxiv.org/abs/2012.05739) -BigData2020, [code](https://github.com/Tverous/HRCenterNet)<br>\n[Look more than once: An accurate detector for text of arbitrary shapes](https://openaccess.thecvf.com/content_CVPR_2019/html/Zhang_Look_More_Than_Once_An_Accurate_Detector_for_Text_of_CVPR_2019_paper.html) -baidu, CVPR2019<br>\n[Gliding vertex on the horizontal bounding box for multi-oriented object detection](https://arxiv.org/abs/1911.09358) -Xiang Bai, arxiv2019[code](https://github.com/MingtaoFu/gliding_vertex)<br>\n[Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection](https://arxiv.org/abs/1912.09629) -jinlianwen, arxiv2019<br>\n[Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network](https://arxiv.org/abs/1908.05900) -face++,  ICCV 2019, [code](https://github.com/whai362/pan_pp.pytorch)<br>\n[A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning](https://arxiv.org/abs/1908.05498) -Pengfei Wang, arxiv2019<br>\n[It's All About The Scale -- Efficient Text Detection Using Adaptive Scaling](https://arxiv.org/abs/1907.12122) -Elad Richardson, arxiv2019<br>\n[FaSTExt: Fast and Small Text Extractor](https://arxiv.org/abs/1908.09231) -Alexander Filonenko, arxiv2019<br>\n[Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning](https://arxiv.org/abs/1908.09990) -Xugong Qin, arxiv2019<br>\n[Learning Shape-Aware Embedding for Scene Text Detection](http://jiaya.me/papers/textdetection_cvpr19.pdf) -CUHK, Tencent, CVPR2019<br>\n[Shape Robust Text Detection with Progressive Scale Expansion Network](https://arxiv.org/abs/1903.12473) -megi++, CVPR2019<br>\n[Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation](https://arxiv.org/abs/1905.05980) -Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, Sungjin Kim, CVPR2019<br>\n[Character Region Awareness for Text Detection](https://arxiv.org/abs/1904.01941) -Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee, CVPR2019<br>\n[Towards Robust Curve Text Detection with Conditional Spatial Expansion](https://arxiv.org/abs/1903.08836) -Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, Wang Ling Goh, CVPR2019<br>\n[Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800) -sensetime, arxiv2019<br>\n[Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes](https://arxiv.org/abs/1904.06535) -baidu, CVPR2019<br>\n[Character Region Awareness for Text Detection](https://arxiv.org/pdf/1904.01941.pdf) -Clova, CVPR2019<br>\n[Detecting Text in the Wild with Deep Character Embedding Network](https://arxiv.org/abs/1901.00363) -baidu, arxiv2019<br>\n[TextField: Learning A Deep Direction Field for Irregular Scene Text Detection](https://arxiv.org/abs/1812.01393) -Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai, arxiv2018<br>\n[TextMountain: Accurate Scene Text Detection via Instance Segmentation](https://arxiv.org/abs/1811.12786) -Yixing Zhu, Jun Du, arxiv2018<br>\n[Mask R-CNN with Pyramid Attention Network for Scene Text Detection](https://arxiv.org/abs/1811.09058) -MSRA, arxiv2018<br>\n[Scene Text Detection with Supervised Pyramid Context Network](https://arxiv.org/abs/1811.08605) -face++, AAAI2019<br>\n[Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks](https://arxiv.org/abs/1811.07432) -cloudwalk, arxiv2018<br>\n[Improving Rotated Text Detection with Rotation Region Proposal Networks](https://arxiv.org/abs/1811.07031) -facebook, arxiv2018<br>\n[IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection](https://arxiv.org/abs/1805.01167) -Alibaba, IJCAI2018<br>\n[TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes](https://arxiv.org/abs/1807.01544) -peking, face++, arxiv2018<br>\n[PSENET: Shape Robust Text Detection with Progressive Scale Expansion Network](https://arxiv.org/abs/1806.02559) -deepinsight, CVPR2019<br>\n[Arbitrary-Oriented Scene Text Detection via Rotation Proposals](https://ieeexplore.ieee.org/abstract/document/8323240/) -J Ma, W Shao, H Ye, L Wang, H Wang, TMM2018<br>\n[TextBoxes++: A Single-Shot Oriented Scene Text Detector](https://arxiv.org/abs/1801.02765) -Minghui Liao, Baoguang Shi, Xiang Bai, arxiv2018 [code](https://github.com/MhLiao/TextBoxes_plusplus)<br>\n[Dense and Tight Detection of Chinese Characters in Historical Documents: Datasets and a Recognition Guided Detector](https://ieeexplore.ieee.org/document/8364534) -JinLianwen, IEEEaccess2018<br>\n[R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection](https://arxiv.org/abs/1706.09579) -Samsung, arxiv2018<br>\n[Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation](https://arxiv.org/abs/1802.08948) -Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai, arxiv2018<br>\n[PixelLink: Detecting Scene Text via Instance Segmentation](https://arxiv.org/abs/1801.01315) -Dan Deng, Haifeng Liu, Xuelong Li, Deng Cai, aaai2018<br>\n[EAST: an efficient and accurate scene text detector](http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhou_EAST_An_Efficient_CVPR_2017_paper.pdf) -Megvii, cvpr2017, [code](https://github.com/argman/EAST)<br>\n[Scene text detection and segmentation based on cascaded convolution neural networks](https://ieeexplore.ieee.org/abstract/document/7828014/) -Y Tang, X Wu, TIP2017<br>\n[TextBoxes: A Fast Text Detector with a Single Deep Neural Network.](http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14202/14295) -M Liao, B Shi, X Bai, X Wang, W Liu, AAAI2017, [code](https://github.com/MhLiao/TextBoxes)<br>\n[Deep direct regression for multi-oriented scene text detection](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Deep_Direct_Regression_ICCV_2017_paper.pdf) -W He, XY Zhang, F Yin, CL Liu, ICCV2017<br>\n[Detecting oriented text in natural images by linking segments](http://openaccess.thecvf.com/content_cvpr_2017/papers/Shi_Detecting_Oriented_Text_CVPR_2017_paper.pdf) -B Shi, X Bai, S Belongie, CVPR2017, [code](https://github.com/bgshih/seglink)<br>\n[Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection](https://arxiv.org/pdf/1703.01425.pdf) -Yuliang Liu, Lianwen Jin, CVPR2017<br>\n[Feature Enhancement Network: A Refined Scene Text Detector](https://arxiv.org/abs/1711.04249) -Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo, arxiv2017<br>\n[Single Shot Text Detector with Regional Attention](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Single_Shot_Text_ICCV_2017_paper.pdf) -Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li, ICCV2017<br>\n[A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling](https://ieeexplore.ieee.org/abstract/document/7733055/) -Xiaohang Ren, Yi Zhou, Jianhua He, Kai Chen, Xiaokang Yang, Jun Sun, TMM2017<br>\n[Fused Text Segmentation Networks for Multi-oriented Scene Text Detection](https://arxiv.org/abs/1709.03272) -Yuchen Dai, et al, arxiv2017<br>\n[Scene Text Detection with Novel Superpixel Based Character Candidate Extraction](https://ieeexplore.ieee.org/abstract/document/8270087/) -Cong Wang, Fei Yin, Cheng-Lin Liu, ICDAR2017<br>\n[WeText: Scene Text Detection under Weak Supervision](https://arxiv.org/abs/1710.04826) -Shangxuan Tian, Shijian Lu, Chongshou Li, ICCV2017<br>\n[WordSup: Exploiting Word Annotations for Character based Text Detection](https://arxiv.org/pdf/1708.06720.pdf) -MSRA, IDL, ICCV2017<br>\n[Deep Residual Text Detection Network for Scene Text](https://arxiv.org/abs/1711.04147) -Xiangyu Zhu, et al, arxiv2017<br>\n[Cascaded Segmentation-Detection Networks for Word-Level Text Spotting](https://arxiv.org/abs/1704.00834) -Siyang Qin, Roberto Manduchi, arxiv2017<br>\n[Arbitrary-Oriented Scene Text Detection via Rotation Proposals](https://arxiv.org/pdf/1703.01086.pdf) -Jianqi Ma, et al, TMM2017<br>\n[Multi-oriented text detection with fully convolutional networks](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhang_Multi-Oriented_Text_Detection_CVPR_2016_paper.pdf) -Z Zhang, C Zhang, W Shen, C Yao, CVPR2016<br>\n[Scene text detection via holistic, multi-channel prediction](https://arxiv.org/abs/1606.09002) -C Yao, X Bai, N Sang, X Zhou, S Zhou, arxiv2016<br>\n\n## Text Recognition\n[Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer](https://arxiv.org/abs/2311.13120) -bytedance, CVPR2024, [code](https://github.com/bytedance/E2STR)<br>\n[Revisiting Scene Text Recognition: A Data Perspective](https://openaccess.thecvf.com/content/ICCV2023/html/Jiang_Revisiting_Scene_Text_Recognition_A_Data_Perspective_ICCV_2023_paper.html) -jinlianwen, ICCV2023, [code](https://github.com/Mountchicken/Union14M)<br>\n[Context Perception Parallel Decoder for Scene Text Recognition](https://arxiv.org/abs/2307.12270) -baidu,arxiv2023<br>\n[Cdistnet: Perceiving multi-domain character distance for robust text recognition](https://arxiv.org/pdf/2111.11011) -fudan, IJCV2023, [code](https://github.com/simplify23/CDistNet)<br>\n[Trocr: Transformer-based optical character recognition with pre-trained models](https://ojs.aaai.org/index.php/AAAI/article/view/26538) -Microsoft, AAAI2023, [code](https://aka.ms/trocr)<br>\n[Context-Based Contrastive Learning for Scene Text Recognition](https://aaai-2022.virtualchair.net/poster_aaai10147) -AAAI2022<br>\n[SVTR: Scene Text Recognition with a Single Visual Model](https://arxiv.org/abs/2205.00159) -baidu, IJCAI2022, [code]()<br>\n[Multi-modal Text Recognition Networks: Interactive enhancements between visual and semantic features](https://arxiv.org/abs/2111.15263) -ECCV2022<br>\n[Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition](https://arxiv.org/abs/2105.06229v2) -hikvision, ICDAR2021, [code](https://github.com/hikopensource/DAVAR-Lab-OCR/tree/main/demo/text_recognition/rflearning)<br>\n[Dictionary-Guided Scene Text Recognition](https://openaccess.thecvf.com/content/CVPR2021/html/Nguyen_Dictionary-Guided_Scene_Text_Recognition_CVPR_2021_paper.html) -CVPR2021,[code](https://github.com/VinAIResearch/dict-guided)<br>\n[TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) -beihang, arxiv2021, [code](https://aka.ms/TrOCR)<br>\n[RecycleNet: An Overlapped Text Instance Recovery Approach](https://dl.acm.org/doi/abs/10.1145/3474085.3481536) -tencent, MMM21<br>\n[Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/abs/2105.08582) Rowel-ICDAR2021<br>\n[Visual-semantic transformer for scene text recognitio](https://arxiv.org/abs/2112.00948)-pingan, arxiv2021<br>\n[PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network](https://arxiv.org/abs/2104.05458) -baidu, AAAI2021, [code](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/algorithm_e2e_pgnet.md)<br>\n[What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels](https://arxiv.org/abs/2103.04400) -tokyo, CVPR2021, [code](https://github.com/ku21fan/STR-Fewer-Labels)<br>\n[Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495) -ShanchengFang, CVPR2021, [code](https://github.com/FangShancheng/ABINet)<br>\n\\[light\\][Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition](https://arxiv.org/abs/2009.10874) --pingan, arxiv2020<br>\n[Gaussian Constrained Attention Network for Scene Text Recognition](https://arxiv.org/abs/2010.09169) -qiaozhi, ICPR2020, [code](https://github.com/Pay20Y/GCAN)<br>\n[Adaptive Text Recognition through Visual Matching](https://arxiv.org/abs/2009.06610) -zisserman, ECCV2020<br>\n[On Vocabulary Reliance in Scene Text Recognition](https://arxiv.org/abs/2005.03959) -megvii, CVPR2020<br>\n[Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization](https://arxiv.org/abs/2007.06890) -JinLianwen, ICFHR2020<br>\n[Text Recognition in Real Scenarios with a Few Labeled Samples](https://arxiv.org/abs/2006.12209) -arxiv2020<br> \n[RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/abs/2007.07542) -ECCV2020<br>\n[On Recognizing Texts of Arbitrary Shapes With 2D Self-Attention](https://openaccess.thecvf.com/content_CVPRW_2020/html/w34/Lee_On_Recognizing_Texts_of_Arbitrary_Shapes_With_2D_Self-Attention_CVPRW_2020_paper.html) -CVPRW2020<br>\n[SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition](https://arxiv.org/abs/2005.10977) -zhi qiao, CVPR2020<br>\n[Text Recognition in the Wild: A Survey](https://arxiv.org/abs/2005.03492) -jinlianwen, arxiv2020<br>\n[GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition](https://arxiv.org/abs/2002.01276) -Wenyang Hu, AAAI2020<br>\n[A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling](https://arxiv.org/abs/2002.03509) -yao cong, ICASSP2020<br>\n[SCATTER: Selective Context Attentional Scene Text Recognizer](https://arxiv.org/abs/2003.11288) -Ron Litman, CVPR2020<br>\n[Scene Text Recognition via Transformer](https://arxiv.org/abs/2003.08077) -Xinjie Feng, arxiv2020<br>\n[Efficient Backbone Search for Scene Text Recognition](https://arxiv.org/abs/2003.06567) -baixiang, arxiv2020<br>\n[Towards Accurate Scene Text Recognition with Semantic Reasoning Networks](https://arxiv.org/abs/2003.12294v1) -Baidu, CVPR2020<br>\n[Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition](https://arxiv.org/abs/2003.06606) -jinlianwen, CVPR2020, [code](https://github.com/Canjie-Luo/Text-Image-Augmentation)<br>\n[Decoupled Attention Network for Text Recognition](https://arxiv.org/abs/1912.10205) -jianlianwen, AAAI2020<br>\n[Fast Dense Residual Network: Enhancing Global Dense Feature Flow for Text Recognition](https://arxiv.org/abs/2001.09021) -Zhao Zhang, arxiv2020<br>\n[Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild](https://arxiv.org/abs/2001.04189) -jin lianwen, arxiv2020<br>\n[TextScanner: Reading Characters in Order for Robust Scene Text Recognition](https://arxiv.org/abs/1912.12422) -yao cong, AAAI2020<br>\n[What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis](https://arxiv.org/abs/1904.01906) clova-ICCV2019,[code](https://github.com/clovaai/deep-text-recognition-benchmark)<br>\n[A Feasible Framework for Arbitrary-Shaped Scene Text Recognition](https://arxiv.org/abs/1912.04561) -champion in ICDAR2019, arxiv2019[code](https://github.com/zhang0jhon/AttentionOCR)<br>\n[Deep Neural Network for Semantic-based Text Recognition in Images](https://arxiv.org/abs/1908.01403) -Yi Zheng, arxiv2019<br>\n[Symmetry-constrained Rectification Network for Scene Text Recognition](https://arxiv.org/abs/1908.01957) -baixiang, ICCV2019<br>\n[Adaptive Embedding Gate for Attention-Based Scene Text Recognition](https://arxiv.org/abs/1908.09475) -Linwen Jin, arxiv2019<br>\n[Focus-Enhanced Scene Text Recognition with Deformable Convolutions](https://arxiv.org/abs/1908.10998) -Yanxiang Gong, arxiv2019, [code](https://github.com/Alpaca07/dtr)<br>\n[Rethinking Irregular Scene Text Recognition](https://arxiv.org/abs/1908.11834) -yao cong, ICDAR19 art champion, [code](https://github.com/Jyouhou/ICDAR2019-ArT-Recognition-Alchemy)<br>\n[Aggregation Cross-Entropy for Sequence Recognition](https://arxiv.org/abs/1904.08364) -Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie, CVPR2019,[code](https://github.com/summerlvsong/Aggregation-Cross-Entropy)<br>\n[Sequence-to-Sequence Domain Adaptation Networkfor Robust Text Image Recognition](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Sequence-To-Sequence_Domain_Adaptation_Network_for_Robust_Text_Image_Recognition_CVPR_2019_paper.pdf), CASIA, CVPR2019<br>\n[Towards End-to-End Text Spotting in Natural Scenes](https://arxiv.org/abs/1906.06013) -LiHui, et al, arxiv2019<br>\n[2D Attentional Irregular Scene Text Recognizer](https://arxiv.org/abs/1906.05708) -Tencent, arxiv2019<br>\n[ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification](https://arxiv.org/abs/1812.05824) -Fangneng Zhan, Shijian Lu, CVPR2019<br>\n[FACLSTM: ConvLSTM with Focused Attention for Scene Text Recognition](https://arxiv.org/abs/1904.09405) -Qingqing Wang, et al, arxiv2019<br>\n[A Multi-Object Rectified Attention Network for Scene Text Recognition](https://arxiv.org/abs/1901.03003) -Canjie Luo, Lianwen Jin, Zenghui Sun, PR2019<br>\n[Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751) -Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang, AAAI2019, [code](https://tinyurl.com/ShowAttendRead)<br>\n[Scene Text Recognition from Two-Dimensional Perspective](https://arxiv.org/abs/1809.06508) -Minghui Liao, Cong Yao, Xiang Bai, et al, AAAI2019<br>\n[Recurrent Calibration Network for Irregular Text Recognition](https://arxiv.org/abs/1812.07145) -Hanqing Lu, arxiv2018<br>\n[ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification](https://arxiv.org/abs/1812.05824) -Fangneng Zhan, Shijian Lu, arxiv2018<br>\n[Synthetically Supervised Feature Learning for Scene Text Recognition](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yang_Liu_Synthetically_Supervised_Feature_ECCV_2018_paper.pdf) -Adobe, ECCV2018<br>\n[Connectionist Temporal Classification with Maximum Entropy Regularization](https://proceedings.neurips.cc/paper/2018/hash/e44fea3bec53bcea3b7513ccef5857ac-Abstract.html) -Tsinghua, NeurIPS2018,[code](https://github.com/liuhu-bigeye/enctc.crnn)<br>\n[ASTER: An Attentional Scene Text Recognizer with Flexible Rectification](https://ieeexplore.ieee.org/document/8395027) -Baixiang, PAMI2018, [code](https://github.com/ayumiymk/aster.pytorch)<br>\n[Edit Probability for Scene Text Recognition](http://openaccess.thecvf.com/content_cvpr_2018/papers/Bai_Edit_Probability_for_CVPR_2018_paper.pdf) -Fudan, Hikvision, cvpr2018<br>\n[SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network](https://pdfs.semanticscholar.org/0e59/f7d7e9c9380b425a94038c7a2500b2f6063a.pdf) -Zichuan Liu, et al, AAAI2018<br>\n[State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines](https://arxiv.org/abs/1810.03436) -arxiv2018<br>\n[SCAN: Sliding Convolutional Attention Network for Scene Text Recognition](https://arxiv.org/abs/1806.00578) -Yichao Wu, et al, arxiv2018<br>\n[NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition](https://arxiv.org/abs/1806.00926) -Fenfen Sheng, et al, arxiv2018<br>\n[AON: Towards Arbitrarily-Oriented Text Recognition](https://arxiv.org/abs/1711.04226) -Hikvision, et al, CVPR2018<br>\n[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://ieeexplore.ieee.org/abstract/document/7801919/) -B Shi, X Bai, C Yao , TPAMI2017 [code](https://github.com/bgshih/crnn)<br>\n[Scene Text Recognition with Sliding Convolutional Character Models](https://arxiv.org/abs/1709.01727) -fei yin, et al, arxiv2017<br>\n[Focusing Attention: Towards Accurate Text Recognition in Natural Images](http://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_Focusing_Attention_Towards_ICCV_2017_paper.pdf) -Hikvision, et al, ICCV2017<br>\n[AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition](https://pdfs.semanticscholar.org/2111/d546ac1cbf170302e44a17c88d26b1c55999.pdf) -Chun Yang, Xu-Cheng Yin, arxiv2017<br>\n[Strokelets: A learned multi-scale mid-level representation for scene text recognition](https://ieeexplore.ieee.org/abstract/document/7453176/) -X Bai, C Yao, W Liu , TIP2016<br>\n[Reading Scene Text in Deep Convolutional Sequences](http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12256/12121) -P He, W Huang, Y Qiao, CC Loy, X Tang, AAAI2016<br>\n[Text-Attentional Convolutional Neural Network for Scene Text Detection](http://ieeexplore.ieee.org/abstract/document/7442550/) -Tong He, Weilin Huang, Yu Qiao, Jian Yao, TIP2016<br>\n[Robust Scene Text Recognition with Automatic Rectification](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shi_Robust_Scene_Text_CVPR_2016_paper.pdf) -Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai, CVPR2016<br>\n[DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images](https://arxiv.org/abs/1605.07314) -Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng, arxiv2016<br>\n[Recursive Recurrent Nets with Attention Modeling for OCR in the Wild](https://arxiv.org/pdf/1603.03101v1.pdf) -Yahoo, CVPR2016<br>\n\n## End-to-End & Text Spotting\n[ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting](https://arxiv.org/abs/2211.10578) -ustc, PAMI2023,[code](https://github.com/FangShancheng/ABINet-PP)<br>\n[Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting](https://arxiv.org/abs/2203.03911) -bytedance, arxiv2022<br>\n[DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting](https://arxiv.org/abs/2203.05122) -naver, arxiv2022, [code]()<br>\n[SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition](https://arxiv.org/abs/2203.10209) -jinlianwen, CVPR2022, [code](https://github.com/mxin262/SwinTextSpotter)<br>\n[End-to-End Video Text Spotting with Transformer](https://arxiv.org/abs/2203.10539) -shenchunhua, arxiv2022, [code](https://github.com/weijiawu/TransDETR)<br>\n[Text Spotting Transformers](https://arxiv.org/abs/2204.01918) -intel, CVPR2022<br>\n[PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System](https://arxiv.org/abs/2206.03001) -baidu, arxiv2022<br>\n\\[light\\][PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System](https://arxiv.org/abs/2109.03144) -baidu, arxiv2021, [code](https://github.com/PaddlePaddle/PaddleOCR)<br>\n[icdar competition][1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit](https://arxiv.org/abs/2104.03544) -hikvision, arxiv2021<br>\n[ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting](https://arxiv.org/abs/2105.03620) -jinlianwen, arxiv2021, [code](https://github.com/aim-uofa/AdelaiDet/)<br>\n\\[light\\][PP-OCR: A Practical Ultra Lightweight OCR System](https://arxiv.org/abs/2009.09941) -baidu, arxiv2020, [code](https://github.com/PaddlePaddle/PaddleOCR)<br>\n[Character Region Attention For Text Spotting](https://arxiv.org/abs/2007.09629) -ECCV2020<br>\n[Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting](https://arxiv.org/abs/2007.09482) -baixiang, ECCV2020, [code]()<br>\n[Text Detection and Recognition in the Wild: A Review](https://arxiv.org/abs/2006.04305) -arxiv2020<br>\n[Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting](https://arxiv.org/abs/2002.06820) -Liang Qiao, AAAI2020<br>\n[All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting](https://arxiv.org/abs/1911.09550) -baixiang, AAAI2020<br>\n[ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network](https://arxiv.org/abs/2002.10200) -jin lianwen, CVPR2020<br>\n[Convolutional Character Networks](https://arxiv.org/abs/1910.07954) -Linjie Xing, Zhi Tian, Weilin Huang, Matthew R. Scott, ICCV2019<br>\n[TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting](http://openaccess.thecvf.com/content_ICCV_2019/papers/Feng_TextDragon_An_End-to-End_Framework_for_Arbitrary_Shaped_Text_Spotting_ICCV_2019_paper.pdf) -Chenglin Liu, CVPR2019<br>\n[Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes](https://arxiv.org/abs/1908.08207) -baixiang, TPAMI2019<br>\n[Towards Unconstrained End-to-End Text Spotting](https://arxiv.org/abs/1908.09231) -google ai, arxiv2019<br>\n[Towards End-to-End Text Spotting in Natural Scenes](https://arxiv.org/abs/1906.06013) -Hui Li, Peng Wang, Chunhua Shen, arxiv2019<br>\n[Weakly supervised precise segmentation for historical document images](https://www.sciencedirect.com/science/article/pii/S0925231219304989) -JIn Lianwen, Neurocomputing2019<br>\n[A Novel Integrated Framework for Learning both Text Detection and Recognition](https://arxiv.org/abs/1811.08611) -alibaba, arxiv2018<br>\n[TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network](https://arxiv.org/abs/1812.09900) -baidu, arxiv2018<br>\n[Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes](https://arxiv.org/abs/1807.02242) -Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai, arxiv2018<br>\n[FOTS: Fast Oriented Text Spotting with a Unified Network](https://arxiv.org/abs/1801.01671) -Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, Junjie Yan, CVPR2018<br>\n[E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text](https://arxiv.org/abs/1801.09919) -Yash Patel, et al, arxiv2018<br>\n[SEE: Towards Semi-Supervised End-to-End Scene Text Recognition](http://arxiv.org/abs/1712.05404) -Christian Bartz, Haojin Yang, Christoph Meinel, AAAI2018<br>\n[An end-to-end TextSpotter with Explicit Alignment and Attention](https://arxiv.org/abs/1803.03474) -Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun, CVPR2018<br>\n[Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks](http://openaccess.thecvf.com/content_ICCV_2017/papers/Li_Towards_End-To-End_Text_ICCV_2017_paper.pdf) -Hui Li, et al, ICCV2017<br>\n[Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework](http://openaccess.thecvf.com/content_ICCV_2017/papers/Busta_Deep_TextSpotter_An_ICCV_2017_paper.pdf) -Michal Busta, et al, ICCV2017, [code](https://github.com/MichalBusta/DeepTextSpotter)<br>\n[Reading Text in the Wild with Convolutional Neural Networks](https://link.springer.com/article/10.1007%2Fs11263-015-0823-z) -Max Jaderberg, et al, IJCV2016<br>\n\n## Text Retrieval\n[Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers](https://arxiv.org/abs/2103.16553) -zisserman, CVPR2021<br>\n[Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval](https://arxiv.org/abs/2104.00650) -zisserman, arxiv2021<br>\n[Scene Text Retrieval via Joint Text Detection and Similarity Learning](https://arxiv.org/abs/2104.01552) -baixiang, CVPR2021, [code/CSVTR database](https://github.com/lanfeng4659/STR-TDSL)<br>\n\n### Synthesis\n[https://github.com/clovaai/synthtiger](synthtiger)<br>\n[Editing Text in the Wild](https://arxiv.org/abs/1908.03047) -baidu, ACM MM 2019<br>\n[Data Augmentation for Scene Text Recognition](https://arxiv.org/abs/2108.06949) -ICCV2021 workshop, [code](https://github.com/roatienza/straug)<br>\n[text_renderer](https://github.com/oh-my-ocr/text_renderer)<br>\n[SynthText](https://github.com/ankush-me/SynthText)<br>\n[SynthText](https://github.com/ankush-me/SynthText)<br>\n[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)<br>\n[UnrealText](https://github.com/Jyouhou/UnrealText/)<br>\n[ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation](https://arxiv.org/abs/2003.10557) -Sharon Fogel, CVPR2020<br>\n[SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds](https://arxiv.org/abs/1907.06007) -Minghui Liao, Boyu Song, Minghang He, Shangbang Long, Cong Yao, Xiang Bai, arxiv2019[code](https://github.com/MhLiao/SynthText3D)<br>\n[Spatial Fusion GAN for Image Synthesis](https://arxiv.org/abs/1812.05840) -Fangneng Zhan, Hongyuan Zhu, Shijian Lu, CVPR2019, [code](https://github.com/Sunshine352/SF-GAN)<br>\n[Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes](https://arxiv.org/abs/1807.03021) -Fangneng Zhan, Shijian Lu, Chuhui Xue, ECCV2018<br>\n\n### Evaluation\n[CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks](https://arxiv.org/abs/2006.06244) -arxiv2020<br>\n[End-To-End Measure for Text Recognition](https://arxiv.org/abs/1908.09584) -ICDAR2019<br>\n[Tightness-aware Evaluation Protocol for Scene Text Detection](https://arxiv.org/abs/1904.00813) -jinlianwen, CVPR2019<br>\n\n### Script identification\n[Patch Aggregator for Scene Text Script Identification](https://arxiv.org/abs/1912.03818) --baixiang, arxiv2019<br>\n\n### Super Resolution\n[Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution](https://ojs.aaai.org/index.php/AAAI/article/view/19904) -fudan, AAAI2022, [code](https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt)<br>\n[Restormer: Efficient Transformer for High-Resolution Image Restoration](https://arxiv.org/abs/2111.09881) google, CVPR2022, [code](https://github.com/swz30/Restormer)<br>\n[Scene Text Telescope: Text-Focused Scene Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.html?ref=https://githubhelp.com) -fudan, CVPR2021<br>\n[Text Prior Guided Scene Text Image Super-resolution](https://arxiv.org/abs/2106.15368) -arxiv2021, [code](https://github.com/mjq11302010044/TPGSR)<br>\n[Scene Text Image Super-Resolution in the Wild](https://arxiv.org/abs/2005.03341) -baixiang, ECCV2020<br>\n\n## Other\n[AnyText: Multilingual Visual Text Generation And Editing](https://arxiv.org/abs/2311.03054) -alibaba, arxiv2023, [code](https://github.com/tyxsspa/AnyText)<br> \n[Stroke-Based Scene Text Erasing Using Synthetic Data for Training](https://ieeexplore.ieee.org/abstract/document/9609970) -TIP2021<br>\n[Page Layout Analysis System for Unconstrained Historic Documents](https://arxiv.org/abs/2102.11838) -ICDAR2021<br>\n[EraseNet: End-to-End Text Removal in the Wild](https://ieeexplore.ieee.org/document/9180003) -Jinlianwen, TIP2020, [code](https://github.com/HCIILAB/SCUT-EnsText)<br>\n[SwapText: Image Based Texts Transfer in Scenes](https://arxiv.org/abs/2003.08152) -Qiangpeng Yang, CVPR2020<br>\n[UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World](https://arxiv.org/abs/2003.10608) -Cong Yao, CVPR2020<br>\n[EnsNet: Ensconce Text in the Wild](https://ojs.aaai.org/index.php/AAAI/article/view/3859) -JinLianwen, AAAI2019, [code](https://github.com/HCIILAB/SCUT-EnsText)<br>\n[TextSR: Content-Aware Text Super-Resolution Guided by Recognition](https://arxiv.org/abs/1909.07113) -forevision, arxiv2019<br>\n[Editing Text in the Wild](https://arxiv.org/abs/1908.03047) -baixiang, ACM MM2019<br>\n[MTRNet: A Generic Scene Text Eraser](https://ieeexplore.ieee.org/abstract/document/8978083) -ICDAR2019<br>\n[Scene Text Detection and Recognition: The Deep Learning Era](https://arxiv.org/abs/1811.04256) -face++, arxiv2018<br>\n[Text/non-text image classification in the wild with convolutional neural networks](https://www.sciencedirect.com/science/article/pii/S0031320316303922) -X Bai, B Shi, C Zhang, X Cai, L Qi, PR2017<br>\n[Scene text script identification with convolutional recurrent neural networks](http://ieeexplore.ieee.org/abstract/document/7900268/) -J Mei, L Dai, B Shi, X Bai, ICPR2016<br>\n\n## Seq2Seq\n[Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) -FAIR, ICML2017<br>\n[Sequence Level Training with Recurrent Neural Networks](https://arxiv.org/abs/1511.06732) -FAIR, ICLR2016<br>\n[A Convolutional Encoder Model for Neural Machine Translation](https://arxiv.org/abs/1611.02344) -FAIR, arxiv2016<br>\n\n## Reading Order\n[LayoutReader: Pre-training of Text and Layout for Reading Order Detection](https://arxiv.org/abs/2108.11591) -MSRA, EMNLP2021, [code/database](https://github.com/microsoft/unilm/tree/master/layoutreader)<br>\n\n## Database & Generation\n### chinese\n[TRW15: ICDAR 2015 Text Reading in the Wild Competition](https://arxiv.org/abs/1506.03184)<br>\nRCTW-17: ICDAR2017-Reading Chinese Text in the Wild<br>\nSTV2k: A New Benchmark for Scene Text Detection and Recognition<br>\n[CTW: Chinese Text in the Wild](https://ctwdataset.github.io/)<br>\nPAL10K<br>\n[COCO TS Dataset](https://arxiv.org/abs/1904.00818)<br>\n[ICPR MTWI 2018 挑战赛一：网络图像的文本识别](https://tianchi.aliyun.com/competition/information.htm?raceId=231650)<br>\n### other\n[Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition](https://arxiv.org/abs/2203.12165)<br>\n[Textual Visual Semantic Dataset for Text Spotting](https://github.com/HCIILAB/dataset)<br>\n[RoadText-1K: Text Detection & Recognition Dataset for Driving Videos](https://arxiv.org/abs/2005.09496)\n[DDI-100: Dataset for Text Detection and Recognition](https://arxiv.org/abs/1912.11658)\n[Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes](https://arxiv.org/pdf/1807.03021.pdf) -Fangneng Zhan, Shijian Lu, and Chuhui Xue, arxiv2018<br>\n[Total-Text](https://github.com/cs-chan/Total-Text-Dataset) -1555 images<br>\n[SCUT-CTW1500](https://github.com/Yuliang-Liu/Curve-Text-Detector) -Curved text in the wild<br>\n[MLT: Multi-lingual scene text detection and script identification](http://rrc.cvc.uab.es/?ch=8) -Multi-lingual text: 18,000 images, 9 different languages representing 6 different scripts<br>\n[Synthetic Word Dataset](http://www.robots.ox.ac.uk/~vgg/data/text/), [Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition](https://arxiv.org/abs/1406.2227)<br>\n[Total-text: A comprehensive dataset for scene text detection and recognition](https://ieeexplore.ieee.org/abstract/document/8270088/) - -Chee Kheng Ch'ng, Chee Seng Chan<br>\nStreet View Text(SVT)<br>\nIIIT 5k-words<br>\nMSRA-TD500<br>\nKAIST Scene_Text Database<br>\nICDAR2011, ICDAR2013, ICDAR2015, ICDAR2017, robust reading-Focused Scene Text<br>\nICDAR2017-ICDAR 2017 Robust Reading Challenge on Omnidirectional Video(DOST)<br>\nCOCO-Text<br>\nGoogle French Street Name Signs (FSNS) dataset<br>\nICDAR2017-ICDAR2017 Competition on Multi-lingual scene text detection and script identification(MLT)<br>\nICDAR2017-Born-Digital Images (Web and Email)<br>\n[Detecting Curve Text in the Wild: New Dataset and New Solution](https://arxiv.org/abs/1712.02170)<br>\n[Synthetic Word](http://www.robots.ox.ac.uk/~vgg/data/text/)<br>\n[Synthetic Data for Text Localisation in Natural Images](https://www.cv-foundation.org/openaccess/content_cvpr_2016/app/S10-06.pdf) -Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR2016<br>\n### vietnamese\n[VinText](https://github.com/VinAIResearch/dict-guided)<br>\n\n## Competition\n[ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)](https://arxiv.org/abs/1708.09585) -B Shi, C Yao, M Liao, M Yang, P Xu, L Cui, arxiv2017<br>\n[ICDAR 2015 competition on robust reading](https://ieeexplore.ieee.org/abstract/document/7333942/)<br>\n[Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4](https://arxiv.org/abs/1511.09207) -Cong Yao, Jianan Wu, Xinyu Zhou, Chi Zhang, Shuchang Zhou, Zhimin Cao, Qi Yin<br>\n\n## Link\n[awesome-deep-text-detection-recognition](https://github.com/hwalsuklee/awesome-deep-text-detection-recognition)<br>\n[Awesome-Scene-Text-Recognition](https://github.com/chongyangtao/Awesome-Scene-Text-Recognition)<br>\n[Scene Text Detection](https://paperswithcode.com/task/scene-text-detection/codeless)<br>\n"
  }
]