The definition of text mining, it only seems simple but the real activity is not
What is text mining?
Often text mining, also known as text data mining or text analytics, is confused with information retrieval: as Wikipedia suggests, the correct definition of text mining is the “the process of deriving high-quality information from text”. Compared to data mining, which processes structured information and extracts useful information from data sets to transform them for further use, text mining takes care of unstructured information. Basically, it’s the process that allows identification of new and unexpected information from a collection of text.
Even if the definition of text mining appears easy enough, we can’t say the same about the actual activity of text mining. To be effective and useful, text mining requires Natural Language Processing—this is why it’s not suitable for all technologies.
To show how it works, we’ll process a very elementary sentence: John said he saw his brother Jack in Baghdad.
(Check out other examples here )
The traditional text mining approach
A traditional text mining approach, by identifying a list of standard entities (people, organizations, places, etc.) can understand that John is a person, but probably can’t understand what happened (who saw his brother?!) nor the relationship (who is Jack’s brother? ). As a result, while it knows that the text talks about a person, it doesn’t extract any other information or insight.
To identify entities which, although not explicitly cited in a text, have a strong, consistent connection with entities already detected in the content, we need an intelligent technology that is able to understand the correct meaning of words and expressions in context, as well as the relationships between different concepts.
In other words, this requires a technology that can mimic the human capacity for disambiguating, reading and understanding text. Unlike traditional approaches, Expert System’s semantic software Cogito can identify the possible relationships that exist between entities and customized entities (not only people, places, organizations, companies but also URLs, email addresses, phone numbers, and values such as dates, currency and denomination, percentages, and virtually any other entity you need can be tagged or extracted).
Starting with a simple definition of text mining, we can now understand the value of a semantic technology, and how, if John is an international terrorist, it can be really strategic when applied for certain uses.
As it happens, Expert System’s semantic software was recently named as a finalist for the 2016 SIIA CODiE Awards in two categories: Best Text Analytics & Semantic Technology, and Best Metadata Management.
See Cogito’s text mining in action and try it for yourself.