POS TAGGING OF UZBEK TEXTS USING HIDDEN MARKOV MODELS (HMM) AND VITERBI
ALGORITHM
Elov B., Hamroyeva Sh., Xusainova Z., Xudayberganov N., Yodgorov U., Yuldashev A.
O„zbekiston Milliy Universiteti
Abstract.
Nowadays one of the popular problems of Natural Language Processing (NLP) is defining the categories of
words in a given text. Being able to determine whether words in a sentence belong to such categories as
noun, pronoun, verb,
adverb,
etc. is important and is called the POS tagging task in NLP. Phrase tagging (POS tagging, PoS tagging, or POST) in NLP,
is also called grammar tagging or phrase segmentation. The process of assigning a word in a text (corpus) to a certain
category is based on the context. This article presents methods and algorithms for tagging Uzbek texts using hidden Markov
models and the Viterbi algorithm based on the tagged corpus of the Uzbek language.
Keywords:
Parts of Speech Tagging, POS tagging, Hdden Markov Model, Markov chain, Hidden Markov Model, HMM,
stochastic methods, NLP, transition probability, emission probability, Viterbi Lattice, Viterbi algorithm.
Introduction
Parts of speech (also known as POS) and named objects are important in learning the grammar of any language.
Knowing whether a word is a noun or a verb, the ability to determine the syntactic structure of the words next to it may
belong to, indicates that the tagging of word groups is one of the main factors. Knowing the meaning of nouns, names of
persons, names of places, etc., is significant in performing many tasks of NLP. In this article, we will study tagging of word
groups, the probability of word sequences, as well as the detection of named objects (NER), and also tagging words in the
form of as a person, a place, an organization.
The world languages have four main categories:
nouns
(including proper nouns),
verbs, adjectives
, and
adverbs,
and a
smaller category such as interjections. English has these five categories, but other languages may not have such categories [1;
2].
Nouns are words that refer to a name, place, thing, person, or event. In many languages, including English and Uzbek,
common nouns are divided into countable and uncountable nouns. Countable nouns can be singular and plural (goat/goats,
relationship/relationships), they can also be counted (one goat, two goats). Uncountable nouns show something as a group.
Thus,
snow, salt,
and
justice
are not considered countable nouns [3; 4]. Parts of speech also have different subgroups.
Verbs describe actions, situations, and processes. Adjectives often describe characteristics or qualities of a horse, such
as its color (white, black), age (old, young), and value (good, bad). Some languages do not have adjectives. For example, in
Korean, words that are adjectives in English act do the function of verbs, so the adjective "beautiful" in English is used as a
verb meaning "to be beautiful" in Korean. Adverbs express the state, amount, degree, time, place of the action. Pronouns act
as stenography for an event. Personal pronouns refer to persons or objects. Possessive pronouns are other forms of pronouns
|