Threshold A c c u ra c y SO-LSA
SO-PMI
Figure 12. Comparison of SO-LSA and SO-PMI with the HM lexicon and TASA.
Table 7 and Figure 13 give the corresponding results for the GI lexicon. The accuracy
is slightly lower with the GI lexicon, but we see the same general trend as with the HM
lexicon. SO-PMI and SO-LSA have approximately the same accuracy when evaluated on
the full test set (threshold 100%), but SO-LSA rapidly pulls ahead as we decrease the
percentage of the test set that is classified. It appears that the magnitude of SO is a better
indicator of confidence for SO-LSA than for SO-PMI, at least when the corpus is
relatively small.
Table 7. The accuracy of SO-LSA and SO-PMI with the GI lexicon and TASA.
Percent of full
test set
Size of test set
Accuracy of
SO-LSA
Accuracy of
SO-PMI
100%
3596
65.27%
61.26%
75%
2697
71.04%
63.92%
50%
1798
75.58%
47.33%
25%
899
81.98%
68.74%
In addition to its lower accuracy, SO-PMI appears less stable than SO-LSA,
especially as the threshold drops below 75%. Comparing with Figure 6, we see that,
although a larger neighbourhood makes SO-PMI more stable, even a neighbourhood of
1000 words (which is like using AND with AltaVista) will not bring SO-PMI up the
accuracy levels of SO-LSA.
28
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Threshold A c c u