Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə349/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   345   346   347   348   349   350   351   352   ...   423
1-Data Mining tarjima

Web content information: This information corresponds to the Web documents and links created by users. The documents are linked to one another with hypertext links. Thus, the content information contains two components that can be mined either together, or in isolation.




    • Document data: The document data are extracted from the pages on the World Wide Web. Some of these extraction methods are discussed in Chap. 13.




    • Linkage data: The Web can be viewed as a massive graph, in which the pages correspond to nodes, and the linkages correspond to edges between nodes. This linkage information can be used in many ways, such as searching the Web or determining the similarity between nodes.




  1. Web usage data: This data corresponds to the patterns of user activity that are enabled by Web applications. These patterns could be of various types.




C. C. Aggarwal, Data Mining: The Textbook, DOI 10.1007/978-3-319-14142-8 18

589

c Springer International Publishing Switzerland 2015



590 CHAPTER 18. MINING WEB DATA



  • Web transactions, ratings, and user feedback: Web users frequently buy various types of items on the Web, or express their affinity for specific products in the form of ratings. In such cases, the buying behavior and/or ratings can be lever-aged to make inferences about the preferences of different users. In some cases, the user feedback is provided in the form of textual user reviews that are referred to as opinions.




  • Web logs: User browsing behavior is captured in the form of Web logs that are typically maintained at most Web sites. This browsing information can be leveraged to make inferences about user activity.

These diverse data types automatically define the types of applications that are common on the Web. In coordination with the different data types, the applications are also either content- or usage-centric.





  1. Content-centric applications: The documents and links on the Web are used in vari-ous applications such as search, clustering, and classification. Some examples of such applications are as follows:




    • Data mining applications: Web documents are used in conjunction with different types of data mining applications such as clustering and categorization. Such applications are used frequently by Web portals for organizing pages.




    • Web crawling and resource discovery: The Web is a tremendous resource of knowledge about documents on various subjects. However, this resource is widely distributed on the Internet, and it needs to be discovered and stored at a single place to make inferences.




    • Web search: The goal in Web search is to discover high-quality, relevant docu-ments in response to a user-specified set of keywords. As will be evident later, the notions of quality and relevance are defined both by the linkage and content structure of the documents.




    • Web linkage mining: In these applications, either actual or logical representations of linkage structure on the Web are mined for useful insights. Examples of logical representations of Web structure include social and information networks. Social networks are linked networks of users, whereas information networks are linked networks of users and objects.





  1. Yüklə 17,13 Mb.

    Dostları ilə paylaş:
1   ...   345   346   347   348   349   350   351   352   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin