TY - JOUR
T1 - A genomic mutational constraint map using variation in 76,156 human genomes
AU - Genome Aggregation Database Consortium
AU - Chen, Siwei
AU - Francioli, Laurent C.
AU - Goodrich, Julia K.
AU - Collins, Ryan L.
AU - Kanai, Masahiro
AU - Wang, Qingbo
AU - Alföldi, Jessica
AU - Watts, Nicholas A.
AU - Vittal, Christopher
AU - Gauthier, Laura D.
AU - Poterba, Timothy
AU - Wilson, Michael W.
AU - Tarasova, Yekaterina
AU - Phu, William
AU - Grant, Riley
AU - Yohannes, Mary T.
AU - Koenig, Zan
AU - Farjoun, Yossi
AU - Banks, Eric
AU - Donnelly, Stacey
AU - Gabriel, Stacey
AU - Gupta, Namrata
AU - Ferriera, Steven
AU - Tolonen, Charlotte
AU - Novod, Sam
AU - Bergelson, Louis
AU - Roazen, David
AU - Ruano-Rubio, Valentin
AU - Covarrubias, Miguel
AU - Llanwarne, Christopher
AU - Petrillo, Nikelle
AU - Wade, Gordon
AU - Jeandet, Thibault
AU - Munshi, Ruchi
AU - Tibbetts, Kathleen
AU - Abreu, Maria
AU - Aguilar Salinas, Carlos A.
AU - Ahmad, Tariq
AU - Albert, Christine M.
AU - Ardissino, Diego
AU - Armean, Irina M.
AU - Atkinson, Elizabeth G.
AU - Atzmon, Gil
AU - Barnard, John
AU - Baxter, Samantha M.
AU - Beaugerie, Laurent
AU - Benjamin, Emelia J.
AU - Benjamin, David
AU - Boehnke, Michael
AU - Turner, Dan
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2024/1/4
Y1 - 2024/1/4
N2 - The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders 1–4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
AB - The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders 1–4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
UR - http://www.scopus.com/inward/record.url?scp=85180828283&partnerID=8YFLogxK
U2 - 10.1038/s41586-023-06045-0
DO - 10.1038/s41586-023-06045-0
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 38057664
AN - SCOPUS:85180828283
SN - 0028-0836
VL - 625
SP - 92
EP - 100
JO - Nature
JF - Nature
IS - 7993
ER -