Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə358/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   354   355   356   357   358   359   360   361   ...   423
1-Data Mining tarjima

s

(xi − xˆ) · (yi − yˆ)










Pearson(










) =







i=1




(18.12)




X,

Y







.




















































s

(xi − xˆ)2 ·

s

(yi − yˆ)2

















































i=1

i=1










The Pearson coefficient is computed between the target user and all the other users. The peer group of the target user is defined as the top-k users with the highest Pearson coefficient of correlation with her. Users with very low or negative correlations are also removed from the peer group. The average ratings of each of the (specified) items of this peer group are returned as the recommended ratings. To achieve greater robustness, it is also possible to weight each rating with the Pearson correlation coefficient of its owner while computing the average. This weighted average rating can provide a prediction for the target user. The items with the highest predicted ratings are recommended to the user.


The main problem with this approach is that different users may provide ratings on different scales. One user may rate all items highly, whereas another user may rate all items negatively. The raw ratings, therefore, need to be normalized before determining the (weighted) average rating of the peer group. The normalized rating of a user is defined by


608 CHAPTER 18. MINING WEB DATA


subtracting her mean rating from each of her ratings. As before, the weighted average of the normalized rating of an item in the peer group is determined as a normalized prediction. The mean rating of the target user is then added back to the normalized rating prediction to provide a raw rating prediction.


18.5.2.2 Item-Based Similarity with Ratings


The main conceptual difference from the user-based approach is that peer groups are con-structed in terms of items rather than users. Therefore, similarities need to be computed between items (or columns in the ratings matrix). Before computing the similarities between the columns, the ratings matrix is normalized. As in the case of user-based ratings, the average of each row in the ratings matrix is subtracted from that row. Then, the cosine similarity between the normalized ratings U = (u1 . . . us) and V = (v1 . . . vs) of a pair of items (columns) defines the similarity between them:


























s













Cosine(




,




) =







i=1 ui · vi




(18.13)




U

V







.







s







s






















i=1 ui2 ·

i=1 vi2










This similarity is referred to as the adjusted cosine similarity, because the ratings are nor-malized before computing the similarity value.


Consider the case in which the rating of item j for user i needs to be determined. The first step is to determine the top-k most similar items to item j based on the aforementioned adjusted cosine similarity. Among the top-k matching items to item j, the ones for which user i has specified ratings are determined. The weighted average value of these (raw) ratings is reported as the predicted value. The weight of item r in this average is equal to the adjusted cosine similarity between item r and the target item j.

The basic idea is to leverage the user’s own ratings in the final step of making the prediction. For example, in a movie recommendation system, the item peer group will typically be movies of a similar genre. The previous ratings history of the same user on such movies is a very reliable predictor of the interests of that user.


18.5.3 Graph-Based Methods


It is possible to use a random walk on the user-item graph, rather than the Pearson corre-lation coefficient, for defining neighborhoods. Such an approach is sometimes more effective for sparse ratings matrices. A bipartite user-item graph G = (N u ∪ Ni , A) is constructed, where Nu is the set of nodes representing users, and Ni is the set of nodes representing items. An undirected edge exists in A between a user and an item for each nonzero entry in the utility matrix. For example, the user-item graph for both utility matrices of Fig. 18.4 is illustrated in Fig. 18.5. One can use either the personalized PageRank or the SimRank method to determine the k most similar users to a given user for user-based collaborative filtering. Similarly, one can use this method to determine the k most similar items to a given item for item-based collaborative filtering. The other steps of user-based collaborative filtering and item-based collaborative filtering remain the same.


A more general approach is to view the problem as a positive and negative link predic-tion problem on the user-item graph. In such cases, the user-item graph is augmented with positive or negative weights on edges. The normalized rating of a user for an item, after subtracting the user-mean, can be viewed as either a positive or negative weight on the edge. For example, consider the graph constructed from the ratings matrix of Fig. 18.4(a). The edge between user U1 and the item Gladiator would become a negative edge because



18.5. RECOMMENDER SYSTEMS

609

Figure 18.5: Preference graph for utility matrices of Fig. 18.4




U1 clearly dislikes the movie Gladiator. The corresponding network would become a signed network. Therefore, the recommendation problem is that of predicting high positive weight edges between users and items in a signed network. A simpler version of the link-prediction problem with only positive links is discussed in Sect. 19.5 of Chap. 19. Refer to the bibli-ographic notes for link prediction methods with positive and negative links. The merit of the link prediction approach is that it can also leverage the available links between differ-ent users in a setting where they are connected by social network links. In such cases, the user-item graph no longer remains bipartite.

When users specify only positive preference values for items, the problem becomes sim-plified because most link prediction methods are designed for positive links. One can also use the random walks on the user-item graph to perform recommendations, rather than using it only to define neighborhoods. For example, in the case of Fig. 18.4b, the same user-item graph of Fig. 18.5 can be used in conjunction with a random-walk approach. This preference graph can be used to provide different types of recommendations:





  1. The top ranking items for the user i can be determined by returning the item nodes with the largest PageRank in a random walk with restart at node i.




  1. The top ranking users for the item j can be determined by returning the user nodes with the largest PageRank in a random walk with restart at node j.

The choice of the restart probability regulates the trade-off between the global popularity of the recommended item/user and the specificity of the recommendation to a particular user/item. For example, consider the case when items need to be recommended to user i. A low teleportation probability will favor the recommendation of popular items which are favored by many users. Increasing the teleportation probability will make the recommenda-tion more specific to user i.


18.5.4 Clustering Methods

One weakness of neighborhood-based methods is the scale of the computation that needs to be performed. For each user, one typically has to perform computations that are propor-tional to at least the number of nonzero entries in the matrix. Furthermore, these compu-tations need to be performed over all users to provide recommendations to different users.


610 CHAPTER 18. MINING WEB DATA


This can be extremely slow. Therefore, a question arises, as to whether one can use cluster-ing methods to speed up the computations. Clustering also helps address the issue of data sparsity to some extent.


Clustering methods are exactly analogous to neighborhood-based methods, except that the clustering is performed as a preprocessing step to define the peer groups. These peer groups are then used for making recommendations. The clusters can be defined either on users, or on items. Thus, they can be used to make either user-user similarity recommen-dations, or item-item similarity recommendations. For brevity, only the user-user recom-mendation approach is described here, although the item-item recommendation approach is exactly analogous. The clustering approach works as follows:





  1. Cluster all the users into ng groups of users using any clustering algorithm.




  1. For any user i, compute the average (normalized) rating of the specified items in its cluster. Report these ratings for user i; after transforming back to the raw value.

The item–item recommendation approach is similar, except that the clustering is applied to the columns rather than the rows. The clusters define the groups of similar items (or implic-itly pseudo-genres). The final step of computing the rating for a user- item combination is similar to the case of neighborhood-based methods. After the clustering has been performed, it is generally very efficient to determine all the ratings. It remains to be explained how the clustering is performed.


18.5.4.1 Adapting k-Means Clustering


To cluster the ratings matrix, it is possible to adapt many of the clustering methods dis-cussed in Chap. 6. However, it is important to adapt these methods to sparsely specified incomplete data sets. Methods such as k-means and Expectation Maximization may be used on the normalized ratings matrix. In the case of the k-means method, there are two major differences from the description of Chap. 6:





  1. In an iteration of k-means, centroids are computed by averaging each dimension over the number of specified values in the cluster members. Furthermore, the centroid itself may not be fully specified.




  1. The distance between a data point and a centroid is computed only over the speci-fied dimensions in both. Furthermore, the distance is divided by the number of such dimensions in order to fairly compare different data points.

The ratings matrix should be normalized before applying the clustering method.


18.5.4.2 Adapting Co-Clustering


The co-clustering approach is described in Sect. 13.3.3.1 of Chap. 13. Co-clustering is well suited to discovery of neighborhood sets of users and items in sparse matrices. The specified entries are treated as 1s and the unspecified entries are treated as 0s for co- clustering. An example of the co-clustering approach, as applied to the utility matrix of Fig. 18.4b, is illustrated in Fig. 18.6a. In this case, only a 2-way co-clustering is shown for simplicity. The co-clustering approach cleanly partitions the users and items into groups with a clear correspondence to each other. Therefore, user-neighborhoods and item-neighborhoods are discovered simultaneously. After the neighborhoods have been defined, the aforementioned



18.5. RECOMMENDER SYSTEMS

611





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   354   355   356   357   358   359   360   361   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin