136
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0
5
10
15
20
25
30
35
40
45
50
precision
k
Precision
for k-NN Artist Similarity
linear
sqrt
baseline
Figure 6.10. Precision for the sets of 1995 artists using the three ambiguity esti-
mators.
Musical Artists
In this experiment,
I
224
is again the set of all artist names in the list composed by
Knees et al. [2004]. This list consists of 14 genres, each with 16 artists.
To find the most appropriate genre for the artists in
I
224
, the genres mentioned
in the list are not all suitable for finding co-occurrences. For example, the term
classical is ambiguous and Alternative Rock/Indie is an infrequent term. We there-
fore manually rewrote the names of the genres into unambiguous ones (such as
classical music) and added some synonyms. After collecting the numbers of co-
occurrences of artists and genres, we summed up the scores of the co-occurrences
for synonyms. In this way, for each artist
b the number of co-occurrences with
the terms
Indie and Alternative Rock are added to the co-occurrences of
b with the
genre
Alternative Rock/Indie. Although the absolute number of co-occurrences
with
Alternative Rock/Indie may increase using this approach, it is notable that we
use a relative measure to determine the most applicable category per artist.
Motivated by the results in [Schedl et al., 2005], for
PCM
we used the
allintitle option in the artist categorization experiment.
For
PM
we selected for the genre-artist relations the patterns in Table 6.12 from
a list of patterns expressing this relation.
For all three methods, we reuse the artist similarities computed in the previous
experiments.
In Table 6.13 the performance of the initial mappings can be found for the
138
[Genre] artists like [Artist]
[Genre] artists such as [Artist]
[Genre] artists for example [Artist]
[Artist] and other [Genre] artists
Table 6.12. Four of the patterns for the artist-genre relation. In the other patterns,
artists
is respectively replaced with
acts
,
musicians
and
bands
.
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
0
5
10
15
20
25
precision
k
artist-genre
categorization
dm
pm
pcm
Figure 6.11. Precision for the categorization of the musical artists.
three methods (
k = 0). We were able to map all artists to a genre. Co-occurrences
between genres and artists thus could be found using
PCM
,
PM
as well as
DM
. The
latter performs best. With respect to the preliminary mapping, the method with the
smallest amount of Google queries performs best.
Using
DM
only few related artists can be found on the documents visited. In-
creasing
k hence does not effect the performance for the final mapping, as the lists
of related artists are small (Figure 6.11). Contrary to especially
PCM
, large num-
bers of
k do not deteriorate the precision.
The performance of the pattern-based method strongly improves by consider-
ing related artists, the best performance is obtained for
k = 8. All methods perform
best for values of
k between 5 and 13. The
Rock n’ Roll artists proved to be the
most problematic to categorize. The artists in the genres
classical, blues and jazz
were all correctly categorized with the best scoring settings.
With the supervised music artist clustering method discussed in [Knees et al.,
2004] a precision of 87% was obtained using complex machine learning techniques