6.4 Experimental Results
135
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
5
10
15
20
25
30
precision
k
Precision
for k-NN Artist Similarity
baseline
linear
sqrt
Figure 6.9. Precision for the sets of 224 artists using the three ambiguity estima-
tors.
turned contained less related instances.
We present the results for the sets
I
224
and
I
1995
in Figures 6.9 and 6.10. For the
set of 224 the performance of the methods using disambiguation is slightly less than
that of the baseline approach. This result is expected, as no ambiguous terms occur
in the set of 224 artists. For the set of 1995 artists however, the results improve
using either the
uniform or the sqrt approach. We note that contrary to the set
of 224 artists, the 1995 set does contain some ambiguous names such
Autograph,
Gamma Ray and Hypocrisy.
For the set of 1732 artists in our own collection, we compare the number of
times that ambiguous artist names occur among the 5 nearest neighbors for the
other artists (Table 6.10). We note that for the term
Juli only one definition is
found. Although the distribution of ambiguous names is quite different for
p
lin
and
p
sqrt
, we cannot draw conclusions on which approach is better suited as currently
no ground truth for artist similarity ranking is available. Hence, a ground truth data
set for such a diverse collection with ambiguous artist names is needed. With such
a set, we can obtain better insights in the quality of web information extraction
methods for these purposes.
6.4.2 Categorizing Instances
In this subsection, we focus on experiments that address the categorization of the
instances in
I
Dostları ilə paylaş: