Home » Journals » BRAIN: Novel Ontologies-Based OCR – Error Correction Cooperating with Graph Component Extraction

Calendar

June 2017
M T W T F S S
« Apr   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Read these articles

  • LiBRI: Freedom – a Way of  Surviving in the Novel Everything FlowsLiBRI: Freedom – a Way of Surviving in the Novel Everything Flows
    PhD Alina Sora from University of Bucharest, brings into the light the concept of freedom, as a way of surviving in the Vasily Grossman novel Everything Flows. The present study concentrates in transit in which an …
  • BRAIN: The Status of Positive Psychology Strengths within the Romanian School in the Digital SocietyBRAIN: The Status of Positive Psychology Strengths within the Romanian School in the Digital Society
    A new research conducted by Georgeta Pânişoară, Ion-Ovidiu Pânişoară, Cristina Sandu, and Ruxandra Chirca (Neacsu) whose the main goal is to pinpoint the links between skills thought in school and those which grant self-fulfilment, is …
  • BRAIN: Validation of Enhanced Emotion Enabled Cognitive Agent Using Virtual Overlay Multi-Agent System Approach
    In order to avoid car accidents and ensure safer roads, Autonomous vehicles (AVs) have been created. These vehicles are capable of sensing its environment and navigating without human input, which is indeed a great step …
  • BRAIN: Brain and AIBRAIN: Brain and AI
    As years pass, the importance of Artificial Intelligence grows at a fast pace. The evolution of this concept is being presented to the readers in Ángel Garrido’s upcoming paper, titled Brain and AI. Starting with …
  • BRAIN: Brain Functors
    BRAIN journal has recently published its latest volume Brain Functors: A mathematical model of intentional perception and action by David Ellerman. Classification hypothesis has foundational significance since it gives applied lenses to describe what is …
  • BRAIN: Browsing Semantic Data in SlovakiaBRAIN: Browsing Semantic Data in Slovakia
    Ján Mojžiš and Michal Laclavík present an interesting topic in the academic article Browsing Semantic Data in Slovakia in the latest volume of BRAIN journal. Semantic data browsing is an important task for open and governmental data in …
  • BRAND: Pricing in Multi-Heston Framework
    BRAND journal provides a very interesting article written by Tiberiu Socaciu from Stefan cel Mare University of Suceava, Faculty of Economics. This article displays a definitive in determining an estimating system’s multi-Heston. Fundamentally, he utilizes the …
  • BRAIN: Integrating MOOCs in Embedded Systems Blended Courses
    The Massive Open Online Course (MOOC) phenomenon has been growing in popularity and importance in the last decade, receiving valuable recognition from renowned universities from all over the world. MOOC can be simply defined as …
  • BRAND: Freedom Of Expression In The View Of Community Law
    Professor Ion Țuțuianu from Vasile Alecsandri University, Faculty of Economic Sciences, writes an intriguing article in BRAND journal, about the freedom of expression from a legal point of view. Regardless of the fact that freedom …
  • BRAIN: Novel Detection Features for SSVEP Based BCI: Coefficient of Variation and Variation Speed
    In neurology and neuroscience research, Steady-State Visually Evoked Potential (SSVEP) are brain signals which occur in response to visual stimulation. The paper Novel Detection Features for SSVEP Based BCI: Coefficient of Variation and Variation Speed – written …
  • LiBRI: Kant and Coleridge on the Issue of MoralityLiBRI: Kant and Coleridge on the Issue of Morality
    LiBRI has a very interesting article written by Nicolae-Andrei Popa. He is giving some insight on the Issue of Morality from the point of view of Kant and Coleridge. This paper seeks to contrast Kant’s …
  • BRAIN: Micro Expression Recognition Using the Eulerian Video Magnification Method
    In the new volume of BRAIN journal, researchers Elham Zarezadeh and Mehdi Rezaeian have come up with a new research on the Micro Expression Recognition Using the Eulerian Video Magnification Method. In this paper they …

Categories

BRAIN: Novel Ontologies-Based OCR – Error Correction Cooperating with Graph Component Extraction

The article written by Sarunya Kanjanawattana and Masaomi Kimura is a study about Optical Character Recognition (OCR), which represents a a typical tool used to transform image-based characters to computer editable characters. The two illustrate a novel method which is a combination of a graph componenet extraction and an OCR-error correction.

In the last years, graphs became very important to researches, as they contain significant information which can be extracted and used. Graphs offer data summarization which presents essential information that is interpreted by acquiring small descriptive details. In order to succeed in obtaining a primary interpretation, OCR was created, which is an approving solution used for acquiring graph components as a digital format o character letters.  This study uses a collection of bar graphs which contains at least axis descriptions and a legend in order to illustrate OCR.

Steps of candidate selection

OCR is widely used, as there are thousands of paper-based documents converted to digitezed information using OCR. Though, it does not provide a 100% correct result, as it can have errors. Poor printing quality, small image resolution, specific language requirement and image noises cause the misrecognition that produce OCR errors. Let’s take the word “BED”: it can be recognized and “8ED” and this is an error. These errors can be classified in non-word errors and real-word erros.  The difference between them is that the non-word errors generate words that does not exist, while real-word errors do recognize different words than the one typed, but the word recognized exists. These are very important aspects, as people who work with OCR should be careful with such errors, in order to notice the incorrect recognition of words. However, OCR should not be directly applied to graph images, as this can cause recognition noise.

The article makes a reference to previous studies that are about image segmentation (a techinque used to capture and separate dominant objects from image backgrounds) and OCR-error correction. This study, however, utilizes a pre-processing and suggests a post-processing method to achieve a difficulty of OCR errors. The methodology is divided into Graph-component extraction (whose task is to separate the components into individual images) and, as done in previous works, OCR-error correction (the use of ontologies and integrating an edit distance and NLP to the correction system).

In order to evaluate the methods and the theory presented, Sarunya Kanjanawattana and Masaomi Kimura conducted experiments. There have been 4 experiments. The first experiment was a combination of the image partition method and edit distance. The result was that all performance rates were presented the lowest values, except the noise ration, which was up to 29,48 %. The second experiment was a combination between the graph component extraction and the edit distance. The result was that the accuracy and F-measure were increased to 57,28% and 50,54%. The thirds experiment was a combination of the image partition method and the OCR-error correction. The performance rates were improved comparing to the first experiment. The accuracy was up to 80,75% and the F-measure reached to 92,28%. Finally, the last experiment consisted of combining the first and the second modles proposed by this study. The results: accuracy: 84,23% and F-measure – 86,02%.

According to these statistics, the researches calculated the token errors and the differences between the results of the four experiments. As following, they proposed a new method of OCR-error correction based on bar graph images using semantics. They obtained the wanted results and proved that the method presentented the highest performance rates greater than other methods.

The next stage of the research consists of graph-content information extraction and of designing a new ontology to support extractable graph information and to utilize other ontologies in order to reveal latent information.

Mihaela Guțu