Home » Journals » BRAIN: Novel Ontologies-Based OCR – Error Correction Cooperating with Graph Component Extraction

Calendar

June 2017
M T W T F S S
« Apr   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Read these articles

  • BRAIN: A speech to text transcription approach based on Romanian corpusBRAIN: A speech to text transcription approach based on Romanian corpus
    Speech recognition applications enable the recognition and translation of spoken languages into text by computers. Due to the fact that the traditional approaches are costly and time consuming, the worldwide industry adopted speech recognition systems. …
  • BRAIN: Classification of Human EmotionBRAIN: Classification of Human Emotion
    An innovative research in BRAIN journal, Classification of Human Emotion from Deap EEG Signal Using Hybrid Improved Neural Networks with Cuckoo Search, academic article provided by M. Sreeshakthy and J. Preethi, both professors at Anna …
  • BRAIN: Brain Functors
    BRAIN journal has recently published its latest volume Brain Functors: A mathematical model of intentional perception and action by David Ellerman. Classification hypothesis has foundational significance since it gives applied lenses to describe what is …
  • BRAIN: Social Media in Science EducationBRAIN: Social Media in Science Education
    The latest volume of the BRAIN Journal has a new intriguing research on Training Teachers for the Knowledge Society: Social Media in Science Education  offered by Dana Crăciun and Mădălin Bunoiu from West University of …
  • BRAIN: Developing Distance Learning Environments in the Context of Cross-Border CooperationBRAIN: Developing Distance Learning Environments in the Context of Cross-Border Cooperation
    Sebastian Fuicu, Mircea Popa, Dalibor Dobrilovic, Marius Marcu and Razvan Bogdan offer a paper named “Developing Distance Learning Environments in the Context of Cross-Border Cooperation” which consists of a collaboration between Politehnica University of Timisoara, …
  • BRAND: Using Reporting in the Internal Communication Process of the CompanyBRAND: Using Reporting in the Internal Communication Process of the Company
    Internal communication can be viewed either as a subsistent procedure of the association, or as a procedure helping to decode and comprehend them more effectively . The author, Cornel Marian Iosif states that associations depend on …
  • LiBRI: When Apologies Are Not Sincere ApologiesLiBRI: When Apologies Are Not Sincere Apologies
    PhD Ahmad Kareem Salem Al-Wuhaili will present in the latest volume of the LiBRI journal an article about the duality of political apologies, aiming to reveal the fact that important political faces apologise without meaning …
  • BRAIN: Efficient Filtering of Noisy Fingerprint ImagesBRAIN: Efficient Filtering of Noisy Fingerprint Images
    Fingerprint identification is an imperative field in the wide space of biometrics with numerous applications, in various zones such as: judicial, cell telephones, access systems, airports. There are many elaborated algorithms for fingerprint identification, but none …
  • BRAIN: The Analysis of E-Commerce Sites with Eye-Tracking TechnologiesBRAIN: The Analysis of E-Commerce Sites with Eye-Tracking Technologies
    In this era of technology, it cannot come as a surprise that E-Commerce sites have become a significant part of the user’s online activity. In order for these websites to remain relevant to the visitors …
  • BRAIN: New Computer Assisted Diagnostic to Detect Alzheimer DiseaseBRAIN: New Computer Assisted Diagnostic to Detect Alzheimer Disease
    Researchers Ben Rabeh Amira, Benzarti Faouzi, Amiri Hamid and Mouna Ben Djebara propose a new study in the BRAIN journal, New Computer Assisted Diagnostic to Detect Alzheimer Disease. In this study, the researchers portray another Computer Assisted Diagnosis (CAD) to naturally …
  • BRAIN: Team Management Styles within the Present Social and Economic ContextBRAIN: Team Management Styles within the Present Social and Economic Context
    Organizational performance is given by the efficiency of the employees and how well they work as a team. The paper “Team Management Styles within the Present Social and Economic Context”, written by Costin Dămășaru, Colonescu …
  • LiBRI: The Play on Incarnation as Artistic Creation in Allen Ginsberg’s ‘Howl’LiBRI: The Play on Incarnation as Artistic Creation in Allen Ginsberg’s ‘Howl’
    In the latest volume of the LiBRI, Andreea Paris illustrates in the article “The Play on Incarnation as Artistic Creation in Allen Ginsberg’s ‘Howl’” different modalities used by Ginsberg in the poem “Howl” to express …

Categories

BRAIN: Novel Ontologies-Based OCR – Error Correction Cooperating with Graph Component Extraction

The article written by Sarunya Kanjanawattana and Masaomi Kimura is a study about Optical Character Recognition (OCR), which represents a a typical tool used to transform image-based characters to computer editable characters. The two illustrate a novel method which is a combination of a graph componenet extraction and an OCR-error correction.

In the last years, graphs became very important to researches, as they contain significant information which can be extracted and used. Graphs offer data summarization which presents essential information that is interpreted by acquiring small descriptive details. In order to succeed in obtaining a primary interpretation, OCR was created, which is an approving solution used for acquiring graph components as a digital format o character letters.  This study uses a collection of bar graphs which contains at least axis descriptions and a legend in order to illustrate OCR.

Steps of candidate selection

OCR is widely used, as there are thousands of paper-based documents converted to digitezed information using OCR. Though, it does not provide a 100% correct result, as it can have errors. Poor printing quality, small image resolution, specific language requirement and image noises cause the misrecognition that produce OCR errors. Let’s take the word “BED”: it can be recognized and “8ED” and this is an error. These errors can be classified in non-word errors and real-word erros.  The difference between them is that the non-word errors generate words that does not exist, while real-word errors do recognize different words than the one typed, but the word recognized exists. These are very important aspects, as people who work with OCR should be careful with such errors, in order to notice the incorrect recognition of words. However, OCR should not be directly applied to graph images, as this can cause recognition noise.

The article makes a reference to previous studies that are about image segmentation (a techinque used to capture and separate dominant objects from image backgrounds) and OCR-error correction. This study, however, utilizes a pre-processing and suggests a post-processing method to achieve a difficulty of OCR errors. The methodology is divided into Graph-component extraction (whose task is to separate the components into individual images) and, as done in previous works, OCR-error correction (the use of ontologies and integrating an edit distance and NLP to the correction system).

In order to evaluate the methods and the theory presented, Sarunya Kanjanawattana and Masaomi Kimura conducted experiments. There have been 4 experiments. The first experiment was a combination of the image partition method and edit distance. The result was that all performance rates were presented the lowest values, except the noise ration, which was up to 29,48 %. The second experiment was a combination between the graph component extraction and the edit distance. The result was that the accuracy and F-measure were increased to 57,28% and 50,54%. The thirds experiment was a combination of the image partition method and the OCR-error correction. The performance rates were improved comparing to the first experiment. The accuracy was up to 80,75% and the F-measure reached to 92,28%. Finally, the last experiment consisted of combining the first and the second modles proposed by this study. The results: accuracy: 84,23% and F-measure – 86,02%.

According to these statistics, the researches calculated the token errors and the differences between the results of the four experiments. As following, they proposed a new method of OCR-error correction based on bar graph images using semantics. They obtained the wanted results and proved that the method presentented the highest performance rates greater than other methods.

The next stage of the research consists of graph-content information extraction and of designing a new ontology to support extractable graph information and to utilize other ontologies in order to reveal latent information.

Mihaela Guțu