3. Page Similarity Assumptions In our study, we do not actually compute the similarity between 2 pages, but instead, make use of Google’s similarity functions to retrieve our results. Since we are more concerned with the representation of the similarity relationship than with the actual computation of the relationship, we do not concern ourselves with the actual methods that are used to compute the similarity. We do, however, make a few assumptions about the similarity relationships. In particular, we assume that:
i) A web page is maximally similar to itself.
Assumption (ii), which is a reflexivity assumption, is non-trivial, and not always true. For example, a query for similar pages of www.sausage.com yields Microsoft Frontpage’s website as the top result. However, querying Frontpage’s similar pages will not find sausage.com in the top 10 results.
As we will discuss later, it is not clear what the correct behavior of our application ought to be under these circumstances; we choose to leave this for future investigation.