Internet reviews can be seen as an efficient communication form, adapted to the digital world of today. However, researchers are, for the most part, oriented towards English based ones. The Romanian language reviews exhibit specific grammar rules and challenges that need customized methods to be dealt with. In their paper, Versavia-Maria Ancusa, Olimpia Ban, and Marian Cornea offer a method for aggregating heterogeneous Romanian language reviews into a homogenous corpus, fit for further analyse.
Basically, their aim is to buil a eWOM data cleaning algorithm and apply it on the Romanian language. The algorithm will be based on a three-phase process. The first stage consists in data collection, it continues with the basic processing, and focuses, in the last stage, on analyzing.
An online review database was used, consisting of 15 200 reviews, written by 8912 different authors with such an age distribution as to represent a significant viewpoint on the modern Romanian language. It included in its structure columns for unique identification of the author, geographical details, a written review describing the experience and a trip satisfaction numerical score.
One of the downfalls regading this method is that it relies very much on human input and decisions, and it needs to be constantly updated to keep pace with language evolution. However, further research and work will focus on automating these stages, using machine learning algorithms that simplify the researchers’ work by detecting unexpected patterns and determining new rules for them.
The upcoming issue of BRAIN Journal will provide more details on how this algorithm works.