| S'inscrire | Se connecter | FAQ | [?] |
On the Resemblance and Containment of Documentsby: Broder
(1998)
|
Reviews
[Write a review of this article]
There are no reviews of this article
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Résumé Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document.
BibTeX record
RIS record