TY - JOUR
T1 - Leveraging genomic diversity for discovery in an electronic health record linked biobank
T2 - the UCLA ATLAS Community Health Initiative
AU - UCLA Precision Health Data Discovery Repository Working Group, UCLA Precision Health ATLAS Working Group
AU - Johnson, Ruth
AU - Ding, Yi
AU - Venkateswaran, Vidhya
AU - Bhattacharya, Arjun
AU - Boulier, Kristin
AU - Chiu, Alec
AU - Knyazev, Sergey
AU - Schwarz, Tommer
AU - Freund, Malika
AU - Zhan, Lingyu
AU - Burch, Kathryn S.
AU - Caggiano, Christa
AU - Hill, Brian
AU - Rakocz, Nadav
AU - Balliu, Brunilda
AU - Denny, Christopher T.
AU - Sul, Jae Hoon
AU - Zaitlen, Noah
AU - Arboleda, Valerie A.
AU - Halperin, Eran
AU - Sankararaman, Sriram
AU - Butte, Manish J.
AU - Lajonchere, Clara
AU - Geschwind, Daniel H.
AU - Pasaniuc, Bogdan
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative—an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). Methods: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. Results: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals’ SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10−16, EAA p-value=6.73×10−11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. Conclusions: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.
AB - Background: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative—an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). Methods: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. Results: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals’ SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10−16, EAA p-value=6.73×10−11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. Conclusions: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.
KW - Biobank
KW - Electronic health records
KW - Genetic ancestry
KW - Genome-wide association studies
KW - Phenome-wide association studies
UR - http://www.scopus.com/inward/record.url?scp=85138127594&partnerID=8YFLogxK
U2 - 10.1186/s13073-022-01106-x
DO - 10.1186/s13073-022-01106-x
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 36085083
AN - SCOPUS:85138127594
SN - 1756-994X
VL - 14
JO - Genome Medicine
JF - Genome Medicine
IS - 1
M1 - 104
ER -