Data Mining: The Textbook


INTEREST GROUP A CO CLUSTER



Yüklə 17,13 Mb.
səhifə359/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   355   356   357   358   359   360   361   362   ...   423
1-Data Mining tarjima

INTEREST GROUP A CO CLUSTER


U1


U4


U6


U2


U3


U5
GLADIATOR HURSPARTACUSGODFATHERGOODFELLASSCARFACE



BEN



1 1 1



  1. 1



11


1 1


1 1 1


1 1


INTEREST


GROUP B
CO CLUSTER



(a) Co-cluster (b) User-item graph

Figure 18.6: Co-clustering of user-item graph


user-based methods and item-based methods can be used to make predictions for the missing entries.

The co-clustering approach also has a nice interpretation in terms of the user-item graph. Let G = (Nu ∪ Ni, A) denote the preference graph, where Nu is the set of nodes representing users, and Ni is the set of nodes representing items. An undirected edge exists in A for each nonzero entry of the utility matrix. Then the co-cluster is a clustering of this graph structure. The corresponding 2-way graph partition is illustrated in Fig. 18.6b. Because of this interpretation in terms of user-item graphs, the approach is able to exploit item-item and user-user similarity simultaneously. Co-clustering methods are also closely related to latent factor models such as nonnegative matrix factorization that simultaneously cluster rows and columns with the use of latent factors.


18.5.5 Latent Factor Models


The clustering methods discussed in the previous section use the aggregate properties of the data to make robust predictions. This can be achieved in a more robust way with latent factor models. This approach can be used either for ratings matrices or for positive preference utility matrices. Latent factor models have increasingly become more popular in recent years. The key idea behind latent factor models is that many dimensionality reduction and matrix factorization methods summarize the correlations across rows and columns in the form of lower dimensional vectors, or latent factors. Furthermore, collaborative filtering is essentially a missing data imputation problem, in which these correlations are used to make predictions. Therefore, these latent factors become hidden variables that encode the correlations in the data matrix in a concise way and can be used to make predictions. A robust estimation of the k-dimensional dominant latent factors is often possible even from incompletely specified data, when the value of k is much less than d. This is because the more concisely defined latent factors can be estimated accurately with the sparsely specified data matrix, as long as the number of specified entries is large enough.


The n users are represented in terms of n corresponding k-dimensional factors, denoted by the vectors U1 . . . Un. The d items are represented by d corresponding k-dimensional



612 CHAPTER 18. MINING WEB DATA

factors, denoted by the vectors I1 . . . Id. The value of k represents the reduced dimensionality of the latent representation. Then, the rating rij for user i and item j is estimated by the vector dot product of the corresponding latent factors:




















(18.14)





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   355   356   357   358   359   360   361   362   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin