A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications

Ozlem Keskin*, Chung Jung Tsai, Haim Wolfson, Ruth Nussinov

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Here, we present a diverse, structurally nonredundant data set of two-chain protein-protein interfaces derived from the PDB. Using a sequence order-independent structural comparison algorithm and hierarchical clustering, 3799 interface clusters are obtained. These yield 103 clusters with at least five nonhomologous members. We divide the clusters into three types. In Type I clusters, the global structures of the chains from which the interfaces are derived are also similar. This cluster type is expected because, in general, related proteins associate in similar ways. In Type II, the interfaces are similar; however, remarkably, the overall structures and functions of the chains are different. The functional spectrum is broad, from enzymes/inhibitors to immunoglobulins and toxins. The fact that structurally different monomers associate in similar ways, suggests "good" binding architectures. This observation extends a paradigm in protein science: It has been well known that proteins with similar structures may have different functions. Here, we show that it extends to interfaces. In Type III clusters, only one side of the interface is similar across the cluster. This structurally nonredundant data set provides rich data for studies of protein-protein interactions and recognition, cellular networks and drug design. In particular, it may be useful in addressing the difficult question of what are the favorable ways for proteins to interact.

Original languageEnglish
Pages (from-to)1043-1055
Number of pages13
JournalProtein Science
Issue number4
StatePublished - Apr 2004


  • Data set of interfaces
  • Motifs, protein-protein interactions
  • Protein binding
  • Protein interfaces
  • Protein-protein association


Dive into the research topics of 'A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications'. Together they form a unique fingerprint.

Cite this