Home » Journals » BRAIN: Novel Ontologies-Based OCR – Error Correction Cooperating with Graph Component Extraction

Calendar

June 2017
M T W T F S S
« Apr   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Read these articles

  • LiBRI: The Play on Incarnation as Artistic Creation in Allen Ginsberg’s ‘Howl’LiBRI: The Play on Incarnation as Artistic Creation in Allen Ginsberg’s ‘Howl’
    In the latest volume of the LiBRI, Andreea Paris illustrates in the article “The Play on Incarnation as Artistic Creation in Allen Ginsberg’s ‘Howl’” different modalities used by Ginsberg in the poem “Howl” to express …
  • BRAND: Eye on China and United StatesBRAND: Eye on China and United States
    The latest volume of the BRAND. journal presents a new study developed by Milad Mahyari and Minoo Alemi, who analysed an eventual economical conflict between China and the United States of America. United States endeavors …
  • BRAIN: Artificial Intelligence as a Decision-making Tool in Planning the Research
    Because of their ability to reproduce the biological neural networks, ANNs (Artificial neural networks) have found uncountable applications to a wide range of disciplines. Simina Maris, Titus Slavici, Petre Nenu and Liliana Baciu will present …
  • BRAND: A New Perspective on Corporate Social ResponsabilityBRAND: A New Perspective on Corporate Social Responsability
    This is the new BRAND: Broad Research in Accounting, Negotiation, and Distribution. The aim of the journal is to make an agora of different experts in economics, social and political sciences. We look for articles …
  • SMART 2016 – Scientific Methods in Academic Research and TeachingSMART 2016 – Scientific Methods in Academic Research and Teaching
    Timișoara, Romania, November 17-20, 2016 Venue The sessions will be held in the amphitheatres of the Central Library of the University Politehnica of Timisoara, Bul. V. Parvan No. 2B, Timisoara – http://www.upt.ro/Informatii_library-of-upt_409_en.html. Programme The conference …
  • BRAIN: The Impact of Infographics in EducationBRAIN: The Impact of Infographics in Education
    Information graphics (also called infographics) are graphic visual representations of information that aim to present information quickly and clearly. Their main purpose is improving cognition by utilizing graphics to enhance the human visual system’s ability to …
  • LiBRI: Virgil’s Inferno. Memory and Reality in the Sixth Book of AeneidLiBRI: Virgil’s Inferno. Memory and Reality in the Sixth Book of Aeneid
    Roxana Maria Fanuţ from West University of Timisoara has developed a study on Virgil’s Inferno, with an emphasis on the Memory and Reality in the Sixth Book of Aeneid. This paper looks to offer a clearer …
  • BRAND: Facilitating Public Audit Understanding At The Community LevelBRAND: Facilitating Public Audit Understanding At The Community Level
    Alexandru Țugui from Alexandru Ioan Cuza University of Iasi, Romania has done a little research in the latest volume of BRAND journal. The issues which European Union is confronted with, as far as open account, decides the …
  • BRAIN: Business Process Management – An Interesting ApproachBRAIN: Business Process Management – An Interesting Approach
    In the latest volume of the BRAIN journal, Roberto Paiano, Adriana Caione, Anna Lisa Guido, Angelo Martella and Andrea Pandurino have come up with an interesting research on the Business Process Development, a traditional approach vs. …
  • BRAIN: New Ideas for Brain ModellingBRAIN: New Ideas for Brain Modelling
    BRAIN is an academic journal willing to create links between specialists from clearly diverse exploratory fields, for example, Computer Science and Neurology. In fact, there is a considerable measure of subjects, for example, Artificial Intelligence, Cognitive …
  • BRAIN: Intelligent System for Diagnosis of a Three-Phase Separator
    Wise frameworks for analysis have been utilized as a part of an assortment of spaces: budgetary assessment, credit scoring issue, distinguishing proof of programming and equipment issues of mechanical and electronic hardware, therapeutic determination, shortcoming …
  • LiBRI: “A Splendid Isolation?” The Rise of a Concept in Victorian IdentityLiBRI: “A Splendid Isolation?” The Rise of a Concept in Victorian Identity
    Mihai Vişoiu from the University of Bucharest develops a study upon the phase that the Victorian society went through, entitled by the press: ‘splendid isolation’. The article: A Splendid Isolation? The Rise of a Concept …

Categories

BRAIN: Novel Ontologies-Based OCR – Error Correction Cooperating with Graph Component Extraction

The article written by Sarunya Kanjanawattana and Masaomi Kimura is a study about Optical Character Recognition (OCR), which represents a a typical tool used to transform image-based characters to computer editable characters. The two illustrate a novel method which is a combination of a graph componenet extraction and an OCR-error correction.

In the last years, graphs became very important to researches, as they contain significant information which can be extracted and used. Graphs offer data summarization which presents essential information that is interpreted by acquiring small descriptive details. In order to succeed in obtaining a primary interpretation, OCR was created, which is an approving solution used for acquiring graph components as a digital format o character letters.  This study uses a collection of bar graphs which contains at least axis descriptions and a legend in order to illustrate OCR.

Steps of candidate selection

OCR is widely used, as there are thousands of paper-based documents converted to digitezed information using OCR. Though, it does not provide a 100% correct result, as it can have errors. Poor printing quality, small image resolution, specific language requirement and image noises cause the misrecognition that produce OCR errors. Let’s take the word “BED”: it can be recognized and “8ED” and this is an error. These errors can be classified in non-word errors and real-word erros.  The difference between them is that the non-word errors generate words that does not exist, while real-word errors do recognize different words than the one typed, but the word recognized exists. These are very important aspects, as people who work with OCR should be careful with such errors, in order to notice the incorrect recognition of words. However, OCR should not be directly applied to graph images, as this can cause recognition noise.

The article makes a reference to previous studies that are about image segmentation (a techinque used to capture and separate dominant objects from image backgrounds) and OCR-error correction. This study, however, utilizes a pre-processing and suggests a post-processing method to achieve a difficulty of OCR errors. The methodology is divided into Graph-component extraction (whose task is to separate the components into individual images) and, as done in previous works, OCR-error correction (the use of ontologies and integrating an edit distance and NLP to the correction system).

In order to evaluate the methods and the theory presented, Sarunya Kanjanawattana and Masaomi Kimura conducted experiments. There have been 4 experiments. The first experiment was a combination of the image partition method and edit distance. The result was that all performance rates were presented the lowest values, except the noise ration, which was up to 29,48 %. The second experiment was a combination between the graph component extraction and the edit distance. The result was that the accuracy and F-measure were increased to 57,28% and 50,54%. The thirds experiment was a combination of the image partition method and the OCR-error correction. The performance rates were improved comparing to the first experiment. The accuracy was up to 80,75% and the F-measure reached to 92,28%. Finally, the last experiment consisted of combining the first and the second modles proposed by this study. The results: accuracy: 84,23% and F-measure – 86,02%.

According to these statistics, the researches calculated the token errors and the differences between the results of the four experiments. As following, they proposed a new method of OCR-error correction based on bar graph images using semantics. They obtained the wanted results and proved that the method presentented the highest performance rates greater than other methods.

The next stage of the research consists of graph-content information extraction and of designing a new ontology to support extractable graph information and to utilize other ontologies in order to reveal latent information.

Mihaela Guțu