Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə360/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   356   357   358   359   360   361   362   363   ...   423
1-Data Mining tarjima

rij ≈ Ui · Ij .




If this relationship is true for every entry of the ratings matrix, then it implies that the entire ratings matrix D = [rij ]n×d can be factorized into two matrices as follows:





DFuserFitemT.

(18.15)

Here Fuser is an n × k matrix, in which the ith row represent the latent factor Ui for user i. Similarly, Fitem is an d × k matrix, in which the jth row represents the latent factor Ij for item j. How can these factors be determined? The two key methods to use for computing these factors are singular value decomposition, and matrix factorization, which will be discussed in the sections below.

18.5.5.1 Singular Value Decomposition


Singular Value Decomposition (SVD) is discussed in detail in Sect. 2.4.3.2 of Chap. 2. The reader is advised to revisit that section before proceeding further. Equation 2.12 of Chap.


2 approximately factorizes the data matrix D into three matrices, and is replicated here:



D ≈ QkΣkPkT .

(18.16)

Here, Qk is an n × k matrix, Σk is a k × k diagonal matrix, and Pk is a d × k matrix. The main difference from the 2-way factorization format is the diagonal matrix Σk. However, this matrix can be included within the user factors. Therefore, one obtains the following factor matrices:





Fuser = QkΣk

(18.17)

Fitem = Pk.

(18.18)

The discussion in Chap. 2 shows that the matrix QkΣk defines the reduced and transformed coordinates of data points in SVD. Thus, each user has a new set of a k-dimensional coor-dinates in a new k-dimensional basis system Pk defined by linear combinations of items. Strictly speaking, SVD is undefined for incomplete matrices, although heuristic approxima-tions are possible. The bibliographic notes provide pointers to methods that are designed to address this issue. Another disadvantage of SVD is its high computational complexity. For nonnegative ratings matrices, PLSA may be used, because it provides a probabilistic factorization similar to SVD.

18.5.5.2 Matrix Factorization




SVD is a form of matrix factorization. Because there are many different forms of matrix factorization, it is natural to explore whether they can be used for recommendations. The reader is advised to read Sect. 6.8 of Chap. 6 for a review of matrix factorization. Equa-tion 6.30 of that section is replicated here:

D ≈ U · VT. (18.19)




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   356   357   358   359   360   361   362   363   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin