IJCAI TEXT PAPERS

百里涛

2023-12-01

粗读IJCAI 文本检测识别相关文章，调研相关数据集目前的State-of-the-Art

1.[IJCAI 2019] Learning to Draw Text in Natural Images with Conditional Adversarial Networks

STS-GAN，生成文本图像。

Inthiswork,weproposeanentirelylearning-based method to automatically synthesize text sequence in natural images leveraging conditional adversarial networks. As vanilla GANs are clumsy to capture structural text patterns, directly employing GANs for text image synthesis typically results in illegible images. Therefore, we design a two-stage architecture to generate repeated characters in images. Firstly, a character generator attempts to synthesize local character appearance independently, so that the legible characters in sequence can be obtained. To achieve style consistency of characters, we propose a novel style lossbasedonvariance-minimization. Secondly,we design a pixel-manipulation word generator constrainedbyself-regularization,whichlearnstoconvert local characters to plausible word image. Experiments on SVHN dataset and ICDAR, IIIT5K datasets demonstrate our method is able to synthesize visually appealing text images. Besides, we also show the high-quality images synthesized by our method can be used to boost the performance of a scene text recognition algorithm.

在这项工作中，我们提出了一种基于整体学习的方法，利用条件对抗性网络自动合成自然图像中的文本序列。由于普通的甘值无法捕捉结构化文本模式，直接使用甘值进行文本图像合成通常会导致图像难以辨认。因此，我们设计了一个两阶段的架构来生成图像中的重复字符。首先，字符生成器尝试独立地综合局部字符的外观，从而获得序列清晰的字符。为了实现字符的风格一致性，我们提出了一种基于方差最小化的新风格。其次，设计了一种基于自正则化约束的像素操作字生成器，学习将局部字符转换为可信的字图像。在SVHN数据集和ICDAR、IIIT5K数据集上的实验表明，该方法能够综合出具有视觉吸引力的文本图像。此外，我们还证明了用我们的方法合成的高质量图像可以提高场景文本识别算法的性能。

2.[IJCAI 2019] Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

Localizing natural language phrases in images is a challengingproblemthatrequiresjointunderstanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose anovelframeworkforunsupervisedvisualgrounding which uses concept learning as a proxy task to obtain self-supervision. The intuition behind this ideaistoencouragethemodeltolocalizetoregions which can explain some semantic property in the data,inourcase,thepropertybeingthepresenceof a concept in a set of images. We present thorough quantitative and qualitative experiments to demonstratetheefficacyofourapproachandshowa5.6% improvementoverthecurrentstateoftheartonVisual Genome dataset, a 5.8% improvement on the ReferItGamedatasetandcomparabletostate-of-art performance on the Flickr30k dataset.

在图像中定位自然语言短语是一个具有挑战性的问题，需要对文本和视觉模态的理解。在无监督的环境中，缺乏监督信号会加剧这种困难。在本文中，我们提出了使用概念学习作为代理任务来获得自我监督的无框架视觉地面设计。这种思想背后的直觉可以解释数据中的一些语义属性，这可以解释数据中的一些语义属性，也可以解释一组图像中概念的存在性。我们提供了彻底的定量和定性实验，以证明在视觉基因组数据集的当前状态中提高5.6％的效果，在Flickr30k数据集上的ReferItGamedatasetand可比较的现有技术性能提高5.8％。

3.[IJCAI 2019] MSR Multi-Scale Shape Regression for Scene Text Detection

State-of-the-art scene text detection techniques predict quadrilateral boxes that are prone to localization errors while dealing with straight or curved text lines of different orientations and lengths in scenes. This paper presents a novel multi-scale shape regression network (MSR) that is capable of locating text lines of different lengths, shapes and curvatures in scenes. The proposed MSR detects scene texts by predicting dense text boundary points that inherently capture the location and shape of text lines accurately and are also more tolerant to the variation of text line length as compared with the state of the arts using proposals or segmentation. Additionally, the multi-scale network extracts and fuses features at different scales which demonstrates superb tolerance to the text scale variation. Extensive experiments over several public datasets show that the proposed MSR obtains superior detection performance for both curved and straight text lines of different lengths and orientations.

最先进的场景文本检测技术可以预测在处理场景中不同方向和长度的直线或曲线文本时容易出现定位误差的四边形框。本文提出了一种新颖的多尺度形状回归网络（MSR），能够在场景中定位不同长度，形状和曲率的文本行。所提出的MSR通过预测密集文本边界点来检测场景文本，所述密集文本边界点本质上准确地捕获文本行的位置和形状，并且与使用提议或分段的艺术状态相比，还更容忍文本行长度的变化。此外，多尺度网络提取和融合不同尺度的特征，这些特征表现出对文本尺度变化的极好耐受性。对几个公共数据集的广泛实验表明，所提出的MSR对于不同长度和方向的曲线和直线文本线都获得了优异的检测性能。

4.[IJCAI 2019] Omnidirectional Scene Text Detection with Sequential-free Box Discretization

Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multioriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability.

野外场景文本通常具有高度的变异特征。使用四边形边界框对文本实例进行定位是检测方法中必不可少的。但是，最近的研究表明，引入四边形边界框进行场景文本检测会带来一个容易被忽视的标签混淆问题，这个问题可能会严重影响检测性能。为了解决这一问题，本文提出了一种将边界盒离散为关键边(KE)的无序列盒离散化(SBD)新方法，该方法可以进一步推导出更有效的提高检测性能的方法。实验表明，该方法在ICDAR 2015、MLT、MSRA-TD500等众多流行的场景文本基准测试中均能取得较好的性能。烧蚀研究还表明，将SBD简单地集成到掩模R-CNN框架中，可以大大提高检测性能。此外，在通用对象数据集HRSC2016 (multioriented ships)上的实验表明，我们的方法可以大大超过目前最先进的方法，显示出其强大的泛化能力。

转载于:https://www.cnblogs.com/wind-chaser/p/11394274.html

IJCAI TEXT PAPERS

相关阅读

相关文章

相关问答