Information extraction from the web using a search engine Citation for published version (apa)

Yüklə 0,9 Mb.

Pdf görüntüsü

səhifə	41/57
tarix	09.02.2022
ölçüsü	0,9 Mb.
	#52298

1 ... 37 38 39 40 41 42 43 44 ... 57

• We collect the n top-ranked tracks according to
Last.fm for every artist in the
list.
• For each of these, we retrieve the most popular tags.
• We compute a normalized form for the tags for each track.
• For the list of the normalized tags of i ∈ I
a
, we retain only those whose
normalized form is applied to at least m out of the n top-ranked tracks for the
respective artist.
In our experiments, we choose to retain the tags that are applied to at least 3
out of the 10 top-ranked tracks for the respective artist.
The tags from Table 6.1 for
Eminem that are removed after normalization and
track filtering are given in Table 6.6.
6.3.3 Checking the Consistency of the Tags
The artist similarities as provided by
Last.fm are based on the listening behavior of
the users. Since we want to use the
Last.fm data as ground truth in music character-
ization, we investigate whether the tagging is consistent, i.e. similar artists should
share a large number of tags.
To ensure this criterion, we selected the set of 224 artists used in [Knees et
al., 2004]
6
, where the artists were originally chosen to be representatives of 14
different genres. For each of the 224 artists, we collected the 100 most similar
artists according to
Last.fm. For the resulting set of the 224 artists and their 100
nearest neighbors, we downloaded the lists of the 100 most popular tags. For each
artist in the list of 224, we first compared the list of tags of the most similar artist.
We did the same for the following (less) similar artists in the list.
6
http://www.cp.jku.at/people/knees/publications/artistlist224.
html

6.3 Evaluating Extracted Subjective Information
125
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
80
90
100
average number of shared tags
k
Average number of shared tags for k-NN
normalized data
last.fm data
track-filtered data
Figure 6.1. Average number of shared tags for the 224 artists.
We computed the average number of overlapping tags for the 224 artists and
their k nearest neighbors and display the results in Figure 6.1. As – especially
after track filtering – often less than 100 tags are assigned to each artist, we also
computed the similarity score for each of the 224 artists and their k nearest neigh-
bors by taking the average number of tags relative to the total number of tags for
the nearest neighbors. For example, if an artist shares 34 out of 40 tags with an
artist in the list of 224, the relative tag similarity score for this artist is 34/40. The
average similarity scores are given in Figure 6.2. The scores are computed using
unfiltered, normalized and track-filtered
Last.fm data.
The average number and score of overlapping tags decreases only slightly for
the unfiltered and normalized data with increasing k. For the track-filtered data,
we even note a small increase in the relative amount of tags shared (starting from
k = 25). This can be explained by the small number of tags that remain after track-
filtering, as can be found in Figure 6.1.
Using the unfiltered
Last.fm tags of all retrieved artists, we estimate the ex-
pected number of tags shared by two randomly chosen artists as 29.8 and the rel-
ative number of shared tags as 0.58. When we filter the tags by normalization and
compare the normalized forms of the tags, we obtain an average of 29.8 shared
tags, with a relative number of 0.62. For the track filtering, these numbers are 3.87
and 0.64 respectively. Hence, the number of tags shared by similar artists is indeed
much larger than that shared by randomly chosen artists.

126
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0
10
20
30
40
50
60
70
80
90
100
relative tag similarity
k
Relative tag similarity for k-NN
track-filtered data
normalized data
last.fm data
Figure 6.2. Relative tag similarity score for the 224 artists and their k Nearest
Neighbors
6.3.4 Evaluating with Data from a Folksonomy
In earlier work (e.g. [Schedl et al., 2006; Pohle, Knees, Schedl, & Widmer, 2007;
Geleijnse & Korst, 2006b]) computed artist similarities were evaluated using the
assumption that two artists are similar when they share a genre. To our best knowl-
edge, only the tagging of artists with a single tag, usually a genre name, has been
addressed in literature. Also in other domains than music, the automatic creating
of a list of tags from unstructured texts from multiple pages on the web has not
been addressed.
As the
Last.fm data shows to be reliable, we propose to use it as a ground truth
for evaluating algorithms that identify tags for artists tagging and compute artist
similarity. The use of such a rich, user-based ground truth gives better insights in
the performance of the algorithm and provides possibilities to study the automatic
labeling of artists with multiple tags. Moreover, by evaluating a method using
artists and
Last.fm ground truth we gain insights in the output of the method. High
quality output for the musical artist data may lead to confidence on domains that
can not be evaluated as easily.
A Dynamic Ground Truth Extraction Algorithm
As the perception of users changes over time, we propose a dynamic ground truth
to evaluate a populated ontology with tags and instances. In the evaluation section
of this chapter, we will use this evaluation method to evaluate populated ontologies
on the artists using
Last.fm data. Moreover, an ontology on books is similarly

6.4 Experimental Results
127
evaluated using the social website
LibraryThing.com.
For the evaluation of similarity of instances in I

Yüklə 0,9 Mb.

Dostları ilə paylaş:

1 ... 37 38 39 40 41 42 43 44 ... 57