Information extraction from the web using a search engine Citation for published version (apa)

Yüklə 0,9 Mb.

Pdf görüntüsü

səhifə	44/57
tarix	09.02.2022
ölçüsü	0,9 Mb.
	#52298

1 ... 40 41 42 43 44 45 46 47 ... 57

a
by selecting the most applicable label in I
g
.

136
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0
5
10
15
20
25
30
35
40
45
50
precision
k
Precision for k-NN Artist Similarity
linear
sqrt
baseline
Figure 6.10. Precision for the sets of 1995 artists using the three ambiguity esti-
mators.
Musical Artists
In this experiment, I
224
is again the set of all artist names in the list composed by
Knees et al. [2004]. This list consists of 14 genres, each with 16 artists.
To find the most appropriate genre for the artists in I
224
, the genres mentioned
in the list are not all suitable for finding co-occurrences. For example, the term
classical is ambiguous and Alternative Rock/Indie is an infrequent term. We there-
fore manually rewrote the names of the genres into unambiguous ones (such as
classical music) and added some synonyms. After collecting the numbers of co-
occurrences of artists and genres, we summed up the scores of the co-occurrences
for synonyms. In this way, for each artist b the number of co-occurrences with
the terms
Indie and Alternative Rock are added to the co-occurrences of b with the
genre
Alternative Rock/Indie. Although the absolute number of co-occurrences
with
Alternative Rock/Indie may increase using this approach, it is notable that we
use a relative measure to determine the most applicable category per artist.
Motivated by the results in [Schedl et al., 2005], for
PCM
we used the
allintitle option in the artist categorization experiment.
For
PM
we selected for the genre-artist relations the patterns in Table 6.12 from
a list of patterns expressing this relation.
For all three methods, we reuse the artist similarities computed in the previous
experiments.
In Table 6.13 the performance of the initial mappings can be found for the

6.4 Experimental Results
137
baseline
using p
lin
using p
sqrt
Babylon Zoo:
1.
Tool
Juli
Juli
2.
Live
Chumbawamba
Chumbawamba
3.
Fish
Jamiroquai
Jamiroquai
4.
Juli
Shakira
Tool
5.
Chumbawamba
Sonic Youth
Shakira
6.
Play
Right Said Fred
Janet Jackson
B12:
1.
Tool
Juli
Juli
2.
Live
Carl Craig
Carl Craig
3.
Fish
Jamiroquai
Jamiroquai
4.
Juli
Autechre
Tool
5.
Play
Shakira
Autechre
6.
Japan
Speedy J.
Shakira
B. Springsteen:
1.
Neil Young
T. Petty & Heartbreakers
Neil Young
2.
U2
Tom Petty
T. Petty & Heartbreakers
3.
Bob Dylan
The Afghan Wigs
Tom Petty
4.
Tom Petty
Neil Young
The Afghan Wigs
5.
Tool
Patti Smith
Bob Dylan
6.
The Afghan Wigs
Robert Plant
Patty Smith
Tool:
1.
Freefrom
Mudvayne
Mudvayne
2.
Mudvayne
Type O Negative
Type O Negative
3.
Racoon
Hothouse Flowers
Hothouse Flowers
4.
Strauss
Massive Attack
Massive Attack
5.
Type O Negative
Nine Inch Nails
Nine Inch Nails
6.
Hothouse Flowers
Dream Theater
Dream Theater
U2:
1.
Hothouse Flowers
Hothouse Flowers
Hothouse Flowers
2.
Radiohead
Radiohead
Radiohead
3.
Sin´ead O’Connor
Sin´ead O’Connor
Sin´ead O’Connor
4.
Madonna
Coldplay
Coldplay
5.
Coldplay
Bruce Springsteen
Talking Heads
6.
Elvis Presley
Pearl Jam
Bruce Springsteen
Table 6.11. Examples of most related artists using the baseline method and the
two alternatives p
lin
(3.3) and p
sqrt
(3.4).

138
[Genre] artists like [Artist]
[Genre] artists such as [Artist]
[Genre] artists for example [Artist]
[Artist] and other [Genre] artists
Table 6.12. Four of the patterns for the artist-genre relation. In the other patterns,
artists
is respectively replaced with
acts
,
musicians
and
bands
.
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
0
5
10
15
20
25
precision
k
artist-genre categorization
dm
pm
pcm
Figure 6.11. Precision for the categorization of the musical artists.
three methods (k = 0). We were able to map all artists to a genre. Co-occurrences
between genres and artists thus could be found using
PCM
,
PM
as well as
DM
. The
latter performs best. With respect to the preliminary mapping, the method with the
smallest amount of Google queries performs best.
Using
DM
only few related artists can be found on the documents visited. In-
creasing k hence does not effect the performance for the final mapping, as the lists
of related artists are small (Figure 6.11). Contrary to especially
PCM
, large num-
bers of k do not deteriorate the precision.
The performance of the pattern-based method strongly improves by consider-
ing related artists, the best performance is obtained for k = 8. All methods perform
best for values of k between 5 and 13. The
Rock n’ Roll artists proved to be the
most problematic to categorize. The artists in the genres
classical, blues and jazz
were all correctly categorized with the best scoring settings.
With the supervised music artist clustering method discussed in [Knees et al.,
2004] a precision of 87% was obtained using complex machine learning techniques

6.4 Experimental Results
139
method

Yüklə 0,9 Mb.

Dostları ilə paylaş:

1 ... 40 41 42 43 44 45 46 47 ... 57