Word meaning and sentence meaning in semantics
A technology that strives to understand human communication must be able to understand meaning in language. In this post, we take a deeper look at a core component of our Cogito AI technology, the semantic disambiguator, and how it determines word meaning and sentence meaning.
To start, let’s clarify our definitions of words and sentences from a linguistic point of view.
Let’s see word meaning and sentence meaning in semantics
A “word” is a string of characters that can have different meanings (jaguar: car or animal?; driver: one who drives a vehicle or the part of a computer?; rows, the plural noun or the third singular person of the verb to row?). A “sentence” is a group of words that express a specific thought: to capture it, we need to understand how words relate to other words (“Paul, Jack’s brother, is married to Linda“. Linda is married to Paul, not Jack.).
Going back to school
To understand word meaning and sentence meaning, our semantic disambiguator engine must be able to automatically resolve ambiguities to understand the meaning of each word in a text.
Let’s consider this sentence:
John Smith is accused of the murders of two police officers.
To understand the word meaning and sentence meaning in any phrase, the disambiguator performs four consecutive phases of analysis:
1. Lexical analysis
This phase breaks up the stream of text into meaningful elements called tokens. The sequence of “atomic” elements resulting from this process will be further elaborated in the next phase of analysis.
John > human proper noun
Smith > human proper noun
is > verb
of > preposition
2. Grammatical analysis
During this phase, each token in the text is assigned a part of speech. The semantic disambiguator is able to recognize inflected forms, conjugations and identify nouns, proper nouns and so on. Starting from a mere sequence of tokens, what results from this elaboration is a sequence of elements. Some of them have been grouped to form collocations (police officer) and every token or group of tokens is represented by a block that identifies its part of speech.
John Smith > human proper noun
is accused > predicate nominal
3. Syntactical Analysis
During this phase, the disambiguator operates several word grouping operations on different levels in order to reproduce the way that words are linked to one another to form sentences. Sentences are further analyzed and elaborated to attribute a logical role to each phrase (subject, object, verb, complement, etc.) and to identify relationships between verbs, subjects and objects and between these and other complements whenever possible. In our example, the sentence is made of a single independent clause, where John Smith is recognized as subject of the sentence
John Smith > subject
is accused > nominal predicate
4. Semantic Analysis
During the last and most complex phase, the tokens recognized during grammatical analysis are associated with a specific meaning. Each token can be associated to several concepts; the choice is made by considering the base form of each token with respect to its part of speech, the grammatical and syntactical characteristics of the token, the position of the token in the sentence and the relation of the token to the syntactical elements surrounding it.
Like the human brain, the disambiguator eliminates all candidate terms for each token except one, which will be definitively associated to the token. When the disambiguator comes across an unknown element in a text (for example, human proper names), it tries to infer word meaning and sentence meaning by considering the context in which each token appears to determine its meaning.
Is accused > to accuse > to blame
police officer > policeman, police woman, law enforcement officer
Want to learn more about the disambiguation process?
Originally published October 2016, updated March 2020