Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study

Hugo Schweke, Qifang Xu, Gerardo Tauriello, Lorenzo Pantolini, Torsten Schwede, Frédéric Cazals, Alix Lhéritier, Juan Fernandez-Recio, Luis Angel Rodríguez-Lumbreras, Ora Schueler-Furman, Julia K. Varga, Brian Jiménez-García, Manon F. Réau, Alexandre M.J.J. Bonvin, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio, Jérôme Tubiana, Haim J. Wolfson, Romina OlivaDidier Barradas-Bautista, Tiziana Ricciardelli, Luigi Cavallo, Česlovas Venclovas, Kliment Olechnovič, Raphael Guerois, Jessica Andreani, Juliette Martin, Xiao Wang, Genki Terashi, Daipayan Sarkar, Charles Christoffer, Tunde Aderinwale, Jacob Verburgt, Daisuke Kihara, Anthony Marchand, Bruno E. Correia, Rui Duan, Liming Qiu, Xianjin Xu, Shuang Zhang, Xiaoqin Zou, Sucharita Dey, Roland L. Dunbrack, Emmanuel D. Levy*, Shoshana J. Wodak*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.

Original languageEnglish
Article number2200323
JournalProteomics
Volume23
Issue number17
DOIs
StatePublished - Sep 2023

Funding

FundersFunder number
ACCIÓ
Alexander Botzki
Arnold Bortman Family Foundation
Estate of Fannie Sherr
Estelle Funk Foundation
Government of Catalonia's Agency for Business Competitiveness
National Science FoundationASDI.2016.043, CMMI1825941, MCB1925643, DBI2003635, DMS2151678, MCB2146026
National Institutes of HealthR01GM133840
National Institute of General Medical SciencesR35GM136409
Universität Basel
Merle S. Cahn Foundation
European Research Council819318
Department of Biotechnology, Ministry of Science and Technology, India
Nederlandse Organisatie voor Wetenschappelijk Onderzoek718.015.001
Israel Academy of Sciences and Humanities860517, 301/2021
Israel Science Foundation
Vlaams Instituut voor Biotechnologie
Ministerio de Ciencia e InnovaciónPID2019‐110167RB‐I00 / AEI / 10.13039/501100011033
Horizon 2020801342, 823830
Swiss Institute of Bioinformatics

    Keywords

    • crystal contacts
    • homodimers
    • potential energy
    • protein interactions
    • protein structure

    Fingerprint

    Dive into the research topics of 'Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study'. Together they form a unique fingerprint.

    Cite this