Threshold
A
c
c
u
ra
c
y
Product
Disjunction
Figure 9. Accuracy of product versus disjunction with AV-ENG and GI.
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Threshold
A
c
c
u
ra
c
y
Product
Disjunction
Figure 10. Accuracy of product versus disjunction with AV-CA and GI.
26
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Threshold
A
c
c
u
ra
c
y
Product
Disjunction
Figure 11. Accuracy of product versus disjunction with TASA and GI.
5.6. SO-LSA Baseline
Table 6 shows the performance of SO-LSA on TASA with the HM lexicon. The
experiment used the online demonstration of LSA, mentioned in Section 5.1. The TASA
corpus was used to generate a matrix
X with 92,409 rows (words) and 37,651 columns
(each document in TASA corresponds to one column), and SVD was used to reduce the
matrix to 300 dimensions. This is the baseline configuration of SO-LSA, as described in
Section 3.2.
Table 6. The accuracy of SO-LSA and SO-PMI with the HM lexicon and TASA.
Percent of full
test set
Size of test set
Accuracy of
SO-LSA
Accuracy of
SO-PMI
100%
1336
67.66%
61.83%
75%
1002
73.65%
64.17%
50%
668
79.34%
46.56%
25%
334
88.92%
70.96%
For ease of comparison, Table 6 also gives the performance of SO-PMI on TASA
with the HM lexicon, copied from Table 4. LSA has not yet been scaled up to corpora of
the sizes of AV-ENG or AV-CA, so we cannot compare SO-LSA and SO-PMI on these
larger corpora. Figure 12 presents a more detailed comparison, as the threshold varies
from 5% to 100% in increments of 5%.
27
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Dostları ilə paylaş: |