6.4 Experimental Results
145
0.32
0.325
0.33
0.335
0.34
0.345
0
0.2
0.4
0.6
0.8
1
w
Spearman’s
Rank Correlation Coefficient
Rho
Figure 6.16. Spearman’s correlation coefficient between the 224 artist tagging
and the Last.fm ground truth.
are modest. Given the difficulty of the task and the nature of the ground truth, we
are nevertheless encouraged by the results.
We observe that some frequently applied tags occur infrequently in web texts
(e.g.
‘i want to hear everything streamable by them’). Such tags were rarely iden-
tified in the texts on the web. On the other hand, among the best scoring tags we
find terms that seem less descriptive but often occur on the web, for example
good,
hot and fun.
Tagging Books
In this second experiment, we focus on books and their tags. Using the social web-
site
LibraryThing.com, we create a ground truth for the 500 most popular books
on this website (Table 6.16 gives the top 25 at the moment of conducting the ex-
periment). After normalization, we reduced the size of the ground truth set of tags
to 286. The book titles have been slightly simplified by removing the text after the
colon (e.g. in
Animal farm : a fairy story).
As two author-title combinations are less likely to co-occur within a sentence,
we gather the co-occurrences scores for the books in
I
a
using
DM
. We query the
book title and the name of the author and gather the (at most) 100 resulting doc-
uments. Again, the pages of the evaluation website are excluded. To identify
co-occurrences, we scan the documents only for the titles of the other books. The
identification of co-occurrences between tags and books is done is a similar fash-
ion.
6.4 Experimental Results
147
fiction
classic
novel
paperback
literature
20th century
Favorites
American
fantasy
hardcover
series
science fiction
english
british
american literature
sf
Contemporary Fiction
Humor
contemporary
1001 books
Table 6.17. The 20 most frequently applied tags on
LibraryThing.com
.
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
0
0.2
0.4
0.6
0.8
1
w
Spearman’s Rank Correlation Coefficient
n=10
n=25
n=50
n=100
n=250
n=500
Figure 6.17. Spearman’s correlation coefficient between the computed tags and
the LibraryThing ground truth.
in texts does not hold for these cases.
Future work should therefore focus on the identification of formulations of
tags in unstructured texts. Using an annotated training set of artists and tags we
can learn such formulations. Moreover, currently we assume the tags in the set
I
Dostları ilə paylaş: