O‘zbekiston Milliy universitetining ilm-fan rivoji va jamiyat taraqqiyotida tutgan o‘rni

List of Uzbek language POS tags (short version)

Yüklə 416,24 Kb.

Pdf görüntüsü

səhifə	11/35
tarix	25.12.2023
ölçüsü	416,24 Kb.
	#196538

1 ... 7 8 9 10 11 12 13 14 ... 35

uzmu maqola 0 (2)

List of Uzbek language POS tags (short version)
№
So‘z turkumi
POS
1.
Ot
N
1.1.
Mavhum ot
NAbs
1.2.
Aniq ot
NCnc
1.3.
Atoqli ot
NP
1.4.
Turdosh ot
NC
1.5.
Sodda ot
ND
1.6.
Qo‘shma ot
NCmp
1.7.
Juf ot
NCpl
1.8.
Takroriy ot
NRep
1.9.
Tub ot
ND0
1.10.
Yasama ot
ND1
2.
Sifat
JJ
2.1.
Sodda sifat

JJD
2.2.
Qo‘shma sifat

JJCmp
2.3.
Juft sifat

JJCpl
2.4.
Takror sifat

JJRep
2.5.
Tub sifat

JJD0
2.6.
Yasama sifat

JJD1
3.
Fe’l
VB
3.1.
Mustaqil fe’l

VBI
3.2.
Oʻtimli fe’l

VBTran (Transitive)
3.3.
Oʻtimsiz fe’l

VBItran (Intransitive)
3.4.
Sodda fe’l

VBD
3.5.
Qoʻshma fe’l

VBComp
3.6.
Juft fe’l

VBCpl
3.7.
Takror fe’l

VBRep
3.8.
Tub fe’l

VBD0
3.9.
Yasama fe’l

VBD1
3.10.
Koʻmakchi fe’l

VBH
3.11.
Yetakchi fe’l

VBL
3.12.
Toʻliqsiz fe’l

VBNotF
4.
Son
NUM
4.1.
Sodda son

NUMD
4.2.
Murakkab son

NUMCmp
5.
Olmosh
P

5.1.
Kishilik olmosh

PP
5.2.
Koʻrsatish olmosh

PDem (Demosntrative)
5.3.
Oʻzlik olmosh

PRef (Reflexive)
5.4.
Gumon olmosh

PPred (Prediction)
5.5.
Soʻroq olmosh

PQues (Question)
5.6.
Sodda olmosh

PD
5.7.
Qoʻshma olmosh

PCmp
5.8.
Juft olmosh

PCpl
5.9.
Takror olmosh

PRep
5.10.
Tub olmosh

PD0
5.11.
Yasama olmosh

PD1
6.
Ravish
RR
6.1.
Payt ravishi

RT
6.2.
Oʻrin ravishi

RL
6.3.
Holat ravishi

RCon (Condition)
6.4.
Miqdor-daraja ravishi

RMes (Measurment)
6.5.
Maqsad ravishi

RPur (Purpose)
6.6.
Sabab ravishi

RRs (Reason)
6.7.
Sodda ravish

RRD
6.8.
Qoʻshma ravish

RRComp

6.9.
Juft ravish

RRCpl
6.10.
Takror ravish

RRRep
6.11.
Tub ravish

RRD0
6.12.
Yasama ravish

RRD1
7.
Bog‘lovchi
C

7.1.
Sof bogʻlovchi

C0
7.2.
Vazifadosh bogʻlovchi

C1
7.3.
Teng bogʻlovchi
CEq (Equeal)
7.4.
Ergashtiruvchi bogʻlovchi

CS
8.
Koʻmakchi
II
8.1.
Sof koʻmakchi

II0
8.2.
Vazifadosh koʻmakchi

II1
9.
Yuklama
Prt (Particle)
9.1.
Soʻz yuklama

PrtD
9.2.
Qo‘shimchasimon yuklama

PrtComp
10.
Modal so‘z
MD
11.

Undov soʻz
UH
11.1.
His-hayajon undovlari
UHEm (Emotion)
11.2
Buyruq-xitob undovlari
UHImp (Imperative)
12.
Taqlid soʻzlar
IM (Imitative)
Using HMM to develop POS Tagger
The words in this Uzbek language can be understood as
observable cases
(given to us in the data). POS tags that
match words can be written as
hidden states
, so HMM can be used to evaluate POS tags. Note that we refer to observed
states as
"observations"
and hidden states as
"states".
The Hidden Markov Model has the following components:
Table 3.
Components of HMM
Q = Q
1
,Q
2
,…Q
n
A group of N cases (Unnnoun)
A = a
11
,a
12
,…a
n1
…a
nn
�

is a transition matrix of probability, representing
the probability of
transition
from each state
�
��
– �
to another state j.
∑
O = O
1
,O
2
,…O
t
�
is a sequence of o
bservations
(O), all of which are taken from a special
dictionary (source).
V = V
1
,V
2
,…V
t
B = b
i
(O
t
)
A
sequence of observation of probabilities
(called emission probabilities), all
of which represent the probability that observation
O
t
will occur from state
�
.
π = π
1
, π
2
,… π
n
The initial probability distribution
for S cases.
π
i
means the probability that the Markov chain starts at a certain state
�
.
∑
In that,
-
Q
is the set of possible labels;
-
A-A
matrix represents the probability that the current tag will be formed from previous tags, keeping the tag
transition probability
P(t
i
|t
i−1
)
Example: Calculation of
A[Verb] [Ot]
:
P (Noun |Verb): Count (Noun and Verb) / Count (Verb)
-
O
- the sequence of observation (for the words in the sentence);
-
B – B
is the emission probability representing the probability of
P(w
i
|t
i
)
that the given tag (supposing, a verb) is
associated with a given word. For example, the outlier probability
B[Verb][boiler
] is calculated using the following formula:
P (pot | Verb): Count (pot and Verb) / pot (Verb)
It should be noted that the values of the
Count ()
function mentioned above are taken from the tagged corpus (data)
of the Uzbek language to study the HMM model [16]. Matrices
"A" and "B"
in the HMM model for the sentence
“Bu yoqdagi
odamlar menga juda yoqdi”
(
"I reall
y liked the people here”) look
like this:

“O‘zbekiston Milliy universitetining ilm-fan rivoji va jamiyat taraqqiyotida tutgan o‘rni”
mavzusidagi
xalqaro ilmiy-amaliy konferensiya, 2023 yil, 12 may

111

Figure 7. Transition and emission matrices in the HMM model
Here, the solid black lines in “
A"
represent the values of
transition matrix,
and the dotted black lines in
"B"
represent
the
emission matrix
for a system with
Q:
{MD, VB, NN}
.
Decoding using HMM
Suppose, an NMM be given, consisting of a transition and emission matrix and a sequence of observations
O=O
1
,O
2
,
…,O
t
(words in corpus sentences). Given these values, we have to to determine the maximum possible sequence of states
Q=Q
1
,Q
2
, …,Q
t
(POS tags). In order to decode a sequence of tags using HMM, two main assumptions are made:
-
the probability of the appearance of the current word depends only on its own tag and does not depend on neighboring
(3-gram) words and tags;
-
the probability of occurrence of a tag does not depend on the sequence of previous tags, but only on the previous tag
(gram 2).
The pseudocode of the HMM algorithm is given below:
Hidden Markov Model Algorithm
Initialize:
σ

← k x N array
For s = 1 … k: ς[1, s] ← π(s)
Pr
[
O
1
|
s
]
X ←
k x N array
Populate
σ
and X
For
i
= 2 …
N
:
For
s
= 1 …
k
:
xprev
← argmax {ς [
i
-1,
x
] Pr[x → s]}

x
X[
i
,
s
] ←
xprev
ς[
i
,
s
] ← ς[
i
-1,
xprev
] Pr[x → s]
Pr
[
O
i
|
s
]
Reconstruct OptPath
:

s ← argmax{ς[
N
,
x
]}
x
Optpath
←
EmptyList
For j = N
… 1:
Optpath
← s ::
Optpath
if j >
1:

s ← X[j, s]

Return OptPath
Viterbi algorithm
The decoding process used for HMM is called
Viterbi algorithm
[26]. First, it is necessary to form a probability matrix
called a
grid.
The
columns
in this matrix are the sequence of words in a sentence;
lines:
represent hidden states (all possible
POS tags). The
Viterbi grid, (Viterbi Algorithm)
corresponding to the saying “
Shirin nodir toshni o‘tga otdi” (
"Shirin threw
the rare stone into the fire"), looks like this:

Figure 8. Viterbi grid/algorithm corresponding to this sentence
In the Viterbi grid/algorithm above, one can observe the columns corresponding to the words in the given sentence
(Shirin
nodir toshni o‘tga otdi
)
(Shirin threw the rare stone into the fire) and the rows representing all known POS tags (RR,
NUM, N, JJ, VB, C, P). The data in this grid can be interpreted as follows:
-
Each cell (cell) of the grid is represented as V
V(t,j)

(“t”
represents a column and
j
represents a row, called the Viterbi
path probability);
-
After the first
t
observations, the HMM calculates the probability of occurrence of state
j
(current POS tag) and
determines/identifies the sequence of states with the highest probability;
The
V(t,j)
value is calculated based on the following formula:

Here:

–
the probability of the Viterbi path corresponding to the previous steps.

– the probability of transition from the previous state
q
i
to the current state
q
j.
;

–
the probability of observation state
j
taking into account of the current state of
O
t
.
Using the Viterbi algorithm, it is necessary to calculate the values of the
transition matrix
“A”
and the
emission matrix
“B”
of the HMM discussed above. In our example, using the bigram HMM model, the POS tags depend only on the previous tag.
For the sentence “
Shirin nodir toshni o‘tga otdi”
("Shirin threw the rare stone into the fire"), it is necessary to have the result
in the following format:

Yüklə 416,24 Kb.

Dostları ilə paylaş:

1 ... 7 8 9 10 11 12 13 14 ... 35