Peer-to-peer (p2p) networks are used by millions for searching and downloading content. Recently, clustering algorithms were shown to be useful for helping users find content in large networks. Yet, many of these algorithms overlook the fact that p2p networks follow graph models with a power-law node degree distribution. This paper studies the obtained clusters when applying clustering algorithms on power-law graphs and their applicability for finding content. Driven by the observed deficiencies, a simple yet efficient clustering algorithm is proposed, which targets a relaxed optimization of a minimal distance distribution of each cluster with a size balancing scheme. A comparative analysis using a song-similarity graph collected from 1.2 million Gnutella users reveals that commonly used efficiency measures often overlook search and recommendation applicability issues and provide the wrong impression that the resulting clusters are well suited for these tasks. We show that the proposed algorithm performs well on various measures that are well suited for the domain.
- Data mining
- Recommender systems