Five Text Analytics Approaches: A Comprehensive Review
The purpose of data goes beyond collection and storage. If meaning is to be extracted from data, it needs to be organised and structured. When applied to data that is in a text format, this process is called text analytics.
By examining customer reviews and feedback, for instance, a business can make certain changes to offer customers better products and services. This is a use of text analytics.
There are ten key processes and features of text analytics, starting with text identification, text mining, and text categorisation. What follows is text clustering, search access, entity relation modelling, and link analysis. The final stages of the process include sentiment analysis, summarisation, and visualisation.
There are various approaches used when carrying out text analytics services and what follows is a comprehensive review of five text analytics approaches.
- Word Spotting
If you take a sentence, a keyword that represents the entire sentence can be singled out. When it comes to customer reviews, these keywords could be price, quality, and ease of use. Spotting these keywords is a text analytics approach known as a word or keyword spotting.
This approach works well with small data sets and accurate results can be achieved. However, the approach is far from perfect and has many issues. The sentence “Price could be lower” will spot price as the keyword. However, the price will not be the keyword in the sentence “bill was too high”, despite both sentences talking about the price of the product.
There are five main instances when word spotting will not work. If you are working with a large volume of data, word spotting will not achieve the desired results. It will also not work if you cannot review each piece of text and correct its accuracy.
If you want to visualise the results, share them with others or maintain consistency, word spotting will also not work.
- Manual Rules
This approach is similar to word spotting but is at a more advanced level. It categorises words in more complex scenarios and allows businesses to customise rules on how a text analytics programme treats various words. Sub-categories can also be added under words, but it can be extremely difficult to model.
If you take a word like expensive, it can be a sub-category of price, but the use of expensive in a sentence may not be as straightforward as saying the price is too high. As an example, a sentence like “It was not as expensive as I thought it would be” does not indicate a high price.
- Text Categorisation
The involvement of a machine learning consultancy in text analytics can be most seen with this approach. In text categorisation, models are picked up from existing datasets and matched to new datasets in order to create forecasts or suggestions.
In this approach, the machine learning algorithm is fed previously seen text examples and categories. This is then fed to the predictive model along with new text examples and the machine learning algorithm learns how the text is categorised and creates rules for itself.
The result of this process is that when presented with new text, the machine learning algorithm applies these rules to new text to categorise it further.
- Topic Modelling
Topic modelling is an unsupervised text analytics approach where the algorithm is fed raw textual data out of which it picks up clusters of topics to arrive at predictions. AI service providers use this approach as it does not need any input besides the raw data.
However, this approach may not be ideal for feedback analysis because the interpretation of topics can be extremely difficult especially since language is understood in different ways.
- Thematic Analysis
This is an approach where themes are extracted from text instead of the text being categorised. The themes that are extracted have the potential of being meaningful and insightful when analysing the entire dataset.
Like topic modelling, thematic analysis is an unsupervised approach. This means that there is no need to set up categories in advance or train the algorithm. However, thematic analysis can be difficult to implement correctly.