Peer-to-peer information retrieval using shared-content clustering

Irad Ben-Gal, Yuval Shavitt, Ela Weinsberg, Udi Weinsberg

Research output: Contribution to journalArticlepeer-review

Abstract

Peer-to-peer (p2p) networks are used by millions for searching and downloading content. Recently, clustering algorithms were shown to be useful for helping users find content in large networks. Yet, many of these algorithms overlook the fact that p2p networks follow graph models with a power-law node degree distribution. This paper studies the obtained clusters when applying clustering algorithms on power-law graphs and their applicability for finding content. Driven by the observed deficiencies, a simple yet efficient clustering algorithm is proposed, which targets a relaxed optimization of a minimal distance distribution of each cluster with a size balancing scheme. A comparative analysis using a song-similarity graph collected from 1.2 million Gnutella users reveals that commonly used efficiency measures often overlook search and recommendation applicability issues and provide the wrong impression that the resulting clusters are well suited for these tasks. We show that the proposed algorithm performs well on various measures that are well suited for the domain.

Original languageEnglish
Pages (from-to)383-408
Number of pages26
JournalKnowledge and Information Systems
Volume39
Issue number2
DOIs
StatePublished - May 2014

Keywords

  • Clustering
  • Data mining
  • Peer-to-peer
  • Recommender systems

Fingerprint

Dive into the research topics of 'Peer-to-peer information retrieval using shared-content clustering'. Together they form a unique fingerprint.

Cite this