However, the value of
will have the same constant (invariant) for all calculations for that cell. So,
we have to multiply the
and
Here,
i:
previous tag;
j:
current tag.
Relying to the considerations above, it is necessary to calculate the value max: V_(t-1) ∙ a(i,j). where j represents the
current cell of the row (POS tag) in the column corresponding to the word "unique". Also, to avoid confusion, the value of
V[j,t] can be interpreted as the value corresponding to the jth row and t-column in the Viterbi matrix.
Biroq
qiymat ushbu katak uchun barcha hisoblar uchun doimiy bir xil qitmat (o‘zgarmas)ga ega bo‘ladi.
Demak,
qiymatni hisoblash uchun
va
qymatni ko‘paytirish lozim.
Bu yerda
i: oldingi teg;
j: joriy teg.
Yuqoridagi
mulohazalardan,
qiymatni hisoblash kerak bo‘ladi. Bu yerda
j
"nodir" so‘ziga mos
ustunidagi joriy qator katakchasi (POS tegi)ni ifodalaydi.
Shuningdek, chalkashmaslik uchun
V[j,t]
qiymatni
Viterbi
matritsasidagi
j-satr
va
t-ustun
ga mos qiymat sifatida talqin qilinish mumkin.
Let’s study
j = 2 (the case where POS is MD).
To do this, we need
to calculate the value of
V(2,2).
= V(1,5) =
0.1838
V(2,1) = 0; V(2,2) = 0; V(2,3) = 0; V(2,6) = 0; V(2,7) = 0;
V(2,4) = V(1,5) * P (P | JJ) = 0.1838 * 0.07= 0.0129;
V(2,5) = V(1,5) * P (P | N) = 0.1838 * 0.32= 0.0589;
It is necessary to calculate all values of V(i,j) according to the above method.
After all the cells of the matrix V are filled, we select the appropriate label for the column (word) to determine the
maximum values
in the grid. For example, N - POS is selected as a tag for the word "Shirin".
Conclusions
This article studied the problems of POS labeling in NLP using Hidden Markov Model and Markov Chains. POS tagging
algorithms had been defined based on rules and stochastic POS tags. In addition, it was noted that the language corpus (data)
should be automatically or manually POS-tagged for the POS tags of a given suggestion.
As an alternative approach to the frequency of words in a language corpus, context labels for individual words were
cogitated, based on the n-gram approach to calculate the probability of occurrence of a sequence of labels. Using tag sequence
probability and word frequency measurements the components of the hidden Markov model were described on the example
of Uzbek lexical units. Moreover, calculations related to evident and latent states for calculating the necessary probabilities
for POS tagging of Uzbek sentences using HMM were given with examples. The components of the hidden Markov model are
described, using measurements of the probability of a sequence of tags and the frequency of words. The HMM matrix was
decoded using the Viterbi algorithm, POS tagging of the Uzbek sentence was achieved, and the results were analyzed. Many
NLP problems can be solved by executing POS tags using HMM and the Viterbi algorithm presented in this article.
References
1.
Daniel Jurafsky, James H. Martin. Speech and Language Processing. − 2023. Draft of January 7, 2023. //
https://web.stanford.edu/~jurafsky/slp3/8.pdf
2.
Atmakuri, S., Shahi, B.,
Ashwath Rao, B., & Muralikrishna, S. N. (2018). A comparison of
features for POS tagging in
Kannada.
International Journal of Engineering and Technology(UAE)
,
7
(4).
https://doi.org/10.14419/ijet.v7i4.14900
3.
Evans N. 2000. Word classes in the world’s languages. In G.Booij, C.Lehmann, J.Mugdan. Morphology: A Handbook on
Inflection and Word Formation, pages 708-732. Mouton.
4.
Garside, R., Leech G., McEnery A. Corpus Abstract. Longman, 1997.
5.
Kumar, A., Katiyar, V., & Kumar, P. (2021). A Comparative Analysis of Pre-Processing Time in Summary of Hindi
Language using Stanza and Spacy.
IOP Conference Series: Materials Science and Engineering
,
1110
(1).
https://doi.org/10.1088/1757-899x/1110/1/012019
6.
Rajasekar, M., & Udhayakumar, A. (2020). POS tagging using naïve bayes algorithm for tamil.
International Journal of
Scientific and Technology Research
,
9
(2).
7.
Kumawat, D., & Jain, V. (2015). POS Tagging Approaches: A Comparison.
International Journal of Computer Applications
,
118
(6). https://doi.org/10.5120/20752-3148
8.
Megyesi, B. (1998). Brill’s rule-based PoS tagger. In
Change
(Issue section 3).
9.
Bulusu, A., & Sucharita, V. (2019). Research on machine learning techniques for POS tagging in NLP.
International
Journal of Recent Technology and Engineering
,
8
(1 Special Issue 4).
10.
Cing, D. L., & Soe, K. M. (2020). Improving accuracy of part-of-speech (POS) tagging using hidden markov model and
morphological analysis for Myanmar Language.
International Journal of Electrical and Computer Engineering
,
10
(2).
https://doi.org/10.11591/ijece.v10i2.pp2023-2030
11.
Jassim, A. K., & Al_Bayaty, B. F. Z. (2021). A Stochastic Approach to Identify POS in Iraqi National Song using N-Iterative
HMM using Agile Approach.
IOP Conference Series: Materials Science and Engineering
,
1094
(1).
https://doi.org/10.1088/1757-899x/1094/1/012019
12.
Pietrzykowski, M., & Sałabun, W. (2014). Applications of Hidden Markov Model: state-of-the-art.
International Journal of
Computer Technology and Applications
,
5
(4).
13.
Stratos, K., Collins, M., & Hsu, D. (2016). Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models.
Transactions of the Association for Computational Linguistics
,
4
. https://doi.org/10.1162/tacl_a_00096
14.
Gao, X., & Zhu, N. (2013). Hidden Markov model and its application in natural language processing.
Information
Technology Journal
,
12
(17). https://doi.org/10.3923/itj.2013.4256.4261