Matches in SemOpenAlex for { <https://semopenalex.org/work/W3025794503> ?p ?o ?g. }
- W3025794503 endingPage "4457" @default.
- W3025794503 startingPage "4449" @default.
- W3025794503 abstract "ABSTRACT Motivation Principal component analysis (PCA) of genetic data is routinely used to infer ancestry and control for population structure in various genetic analyses. However, conducting PCA analyses can be complicated and has several potential pitfalls. These pitfalls include (i) capturing linkage disequilibrium (LD) structure instead of population structure, (ii) projected PCs that suffer from shrinkage bias, (iii) detecting sample outliers and (iv) uneven population sizes. In this work, we explore these potential issues when using PCA, and present efficient solutions to these. Following applications to the UK Biobank and the 1000 Genomes project datasets, we make recommendations for best practices and provide efficient and user-friendly implementations of the proposed solutions in R packages bigsnpr and bigutilsr. Results For example, we find that PC19–PC40 in the UK Biobank capture complex LD structure rather than population structure. Using our automatic algorithm for removing long-range LD regions, we recover 16 PCs that capture population structure only. Therefore, we recommend using only 16–18 PCs from the UK Biobank to account for population structure confounding. We also show how to use PCA to restrict analyses to individuals of homogeneous ancestry. Finally, when projecting individual genotypes onto the PCA computed from the 1000 Genomes project data, we find a shrinkage bias that becomes large for PC5 and beyond. We then demonstrate how to obtain unbiased projections efficiently using bigsnpr. Overall, we believe this work would be of interest for anyone using PCA in their analyses of genetic data, as well as for other omics data. Availability and implementation R packages bigsnpr and bigutilsr can be installed from either CRAN or GitHub (see https://github.com/privefl/bigsnpr). A tutorial on the steps to perform PCA on 1000G data is available at https://privefl.github.io/bigsnpr/articles/bedpca.html. All code used for this paper is available at https://github.com/privefl/paper4-bedpca/tree/master/code. Supplementary information Supplementary data are available at Bioinformatics online." @default.
- W3025794503 created "2020-05-21" @default.
- W3025794503 creator A5033000330 @default.
- W3025794503 creator A5036805387 @default.
- W3025794503 creator A5039794476 @default.
- W3025794503 creator A5044233598 @default.
- W3025794503 creator A5078389898 @default.
- W3025794503 date "2020-05-16" @default.
- W3025794503 modified "2023-10-14" @default.
- W3025794503 title "Efficient toolkit implementing best practices for principal component analysis of population genetic data" @default.
- W3025794503 cites W1966775465 @default.
- W3025794503 cites W1974611538 @default.
- W3025794503 cites W1980431326 @default.
- W3025794503 cites W1989638282 @default.
- W3025794503 cites W1992085420 @default.
- W3025794503 cites W2009588715 @default.
- W3025794503 cites W2024753568 @default.
- W3025794503 cites W2027455260 @default.
- W3025794503 cites W2039792137 @default.
- W3025794503 cites W2040730345 @default.
- W3025794503 cites W2047165046 @default.
- W3025794503 cites W2049454545 @default.
- W3025794503 cites W2069735056 @default.
- W3025794503 cites W2086062071 @default.
- W3025794503 cites W2099085143 @default.
- W3025794503 cites W2102213696 @default.
- W3025794503 cites W2104549677 @default.
- W3025794503 cites W2107916366 @default.
- W3025794503 cites W2108169091 @default.
- W3025794503 cites W2127288683 @default.
- W3025794503 cites W2134857847 @default.
- W3025794503 cites W2155496693 @default.
- W3025794503 cites W2157752701 @default.
- W3025794503 cites W2163516107 @default.
- W3025794503 cites W2168354474 @default.
- W3025794503 cites W2284253967 @default.
- W3025794503 cites W2484383958 @default.
- W3025794503 cites W2794694102 @default.
- W3025794503 cites W2895486342 @default.
- W3025794503 cites W2938965511 @default.
- W3025794503 cites W2949231000 @default.
- W3025794503 cites W2951349772 @default.
- W3025794503 cites W2951456052 @default.
- W3025794503 cites W2963655370 @default.
- W3025794503 cites W3003644143 @default.
- W3025794503 cites W3012250061 @default.
- W3025794503 cites W3019415451 @default.
- W3025794503 cites W3032683009 @default.
- W3025794503 doi "https://doi.org/10.1093/bioinformatics/btaa520" @default.
- W3025794503 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/7750941" @default.
- W3025794503 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/32415959" @default.
- W3025794503 hasPublicationYear "2020" @default.
- W3025794503 type Work @default.
- W3025794503 sameAs 3025794503 @default.
- W3025794503 citedByCount "66" @default.
- W3025794503 countsByYear W30257945032020 @default.
- W3025794503 countsByYear W30257945032021 @default.
- W3025794503 countsByYear W30257945032022 @default.
- W3025794503 countsByYear W30257945032023 @default.
- W3025794503 crossrefType "journal-article" @default.
- W3025794503 hasAuthorship W3025794503A5033000330 @default.
- W3025794503 hasAuthorship W3025794503A5036805387 @default.
- W3025794503 hasAuthorship W3025794503A5039794476 @default.
- W3025794503 hasAuthorship W3025794503A5044233598 @default.
- W3025794503 hasAuthorship W3025794503A5078389898 @default.
- W3025794503 hasBestOaLocation W30257945031 @default.
- W3025794503 hasConcept C104317684 @default.
- W3025794503 hasConcept C116567970 @default.
- W3025794503 hasConcept C124101348 @default.
- W3025794503 hasConcept C135763542 @default.
- W3025794503 hasConcept C144024400 @default.
- W3025794503 hasConcept C149923435 @default.
- W3025794503 hasConcept C154945302 @default.
- W3025794503 hasConcept C185592680 @default.
- W3025794503 hasConcept C197754878 @default.
- W3025794503 hasConcept C198531522 @default.
- W3025794503 hasConcept C27438332 @default.
- W3025794503 hasConcept C2908647359 @default.
- W3025794503 hasConcept C35605836 @default.
- W3025794503 hasConcept C41008148 @default.
- W3025794503 hasConcept C43617362 @default.
- W3025794503 hasConcept C54355233 @default.
- W3025794503 hasConcept C60644358 @default.
- W3025794503 hasConcept C79337645 @default.
- W3025794503 hasConcept C86803240 @default.
- W3025794503 hasConceptScore W3025794503C104317684 @default.
- W3025794503 hasConceptScore W3025794503C116567970 @default.
- W3025794503 hasConceptScore W3025794503C124101348 @default.
- W3025794503 hasConceptScore W3025794503C135763542 @default.
- W3025794503 hasConceptScore W3025794503C144024400 @default.
- W3025794503 hasConceptScore W3025794503C149923435 @default.
- W3025794503 hasConceptScore W3025794503C154945302 @default.
- W3025794503 hasConceptScore W3025794503C185592680 @default.
- W3025794503 hasConceptScore W3025794503C197754878 @default.
- W3025794503 hasConceptScore W3025794503C198531522 @default.
- W3025794503 hasConceptScore W3025794503C27438332 @default.
- W3025794503 hasConceptScore W3025794503C2908647359 @default.
- W3025794503 hasConceptScore W3025794503C35605836 @default.