Microsoft Word turney-littman-acm doc



Yüklə 200 Kb.
Pdf görüntüsü
səhifə11/18
tarix22.05.2023
ölçüsü200 Kb.
#119806
1   ...   7   8   9   10   11   12   13   14   ...   18
Threshold
A
c
c
u
ra
c
y
NEAR
AND
Figure 8. AND versus NEAR with AV-CA and the GI lexicon. 


24
5.5. Product versus Disjunction 
Recall equation (10), for calculating SO-PMI(
word
): 
(17) 
As we discussed in Section 3.1, this equation requires fourteen queries to AltaVista for 
each word (ignoring the constant terms). In this section, we investigate whether the 
number of queries can be reduced by combining the paradigm words, using the OR 
operator. 
For convenience, we introduce the following definitions: 
(18) 
(19) 
Given the fourteen paradigm words, for example, we have the following (from equations 
(5), (6), (18), and (19)): 
(20) 
(21) 
We attempt to approximate (17) as follows:
8
(22) 
Calculating the semantic orientation of a word using equation (22) requires only two 
queries per word, instead of fourteen (ignoring the constant terms, hits(
Pquery
) and 
hits(
Nquery
)). 
Figure 9 plots the performance of product (equation (17)) versus disjunction (equation 
(22)) for SO-PMI with the AV-ENG corpus and the GI lexicon. Figure 10 shows the 
performance with the AV-CA corpus and Figure 11 with the TASA corpus. For the 
largest corpus, there is a clear advantage to using our original equation (17), but the two 
equations have similar performance with the smaller corpora. Since the execution time of 
SO-PMI is almost completely dependent on the number of queries sent to AltaVista, 
equation (22) executes seven times faster than equation (17). Therefore the disjunction 
8
We use OR here, because using AND or NEAR would almost always result in zero hits. We add 0.01 to the 
hits, to avoid division by zero. 
SO-PMI(
word












Pwords
pword
Nwords
nword
Pwords
pword
Nwords
nword
nword
word
pword
nword
pword
word
)
NEAR
hits(
)
hits(
)
hits(
)
NEAR
hits(
log
2

pword
Pquery
Pwords
pword

=
OR
nword
Nquery
Nwords
nword

=
OR

Pquery = (good OR nice OR ... OR superior) 
Nquery = (bad OR nasty OR ... OR inferior). 
SO-PMI(word) =


)
(
hits
)
NEAR
(
hits
)
hits(
)
NEAR
(
hits
log
2
Pquery
Nquery
word
Nquery
Pquery
word



25
equation should be preferred for smaller corpora and the product equation should be 
preferred for larger corpora. 
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100

Yüklə 200 Kb.

Dostları ilə paylaş:
1   ...   7   8   9   10   11   12   13   14   ...   18




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin