Matches in SemOpenAlex for { <https://semopenalex.org/work/W3047023009> ?p ?o ?g. }
- W3047023009 abstract "Abstract An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g. microarrays, exome capture, short-read WGS), from which a few individuals are selected for resequencing using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically been focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. To address this goal, SVCollector ( https://github.com/fritzsedlazeck/SVCollector ) identifies the optimal subset of individuals for resequencing. SVCollector analyzes a population-level VCF file from a low resolution genotyping study. It then computes a ranked list of samples that maximizes the total number of variants present from a subset of a given size. To solve this optimization problem, SVCollector implements a fast greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3K Rice Genomes Project and show the rankings it computes are more representative than widely used naive strategies. Notably, we show that when selecting an optimal subset of 100 samples in these two cohorts, SV-Collector identifies individuals from every subpopulation while naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts of different sizes selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples." @default.
- W3047023009 created "2020-08-10" @default.
- W3047023009 creator A5004006041 @default.
- W3047023009 creator A5008484513 @default.
- W3047023009 creator A5012483156 @default.
- W3047023009 creator A5037440162 @default.
- W3047023009 creator A5043158509 @default.
- W3047023009 creator A5058192453 @default.
- W3047023009 creator A5073871292 @default.
- W3047023009 creator A5077891969 @default.
- W3047023009 creator A5082851461 @default.
- W3047023009 date "2020-08-06" @default.
- W3047023009 modified "2023-09-25" @default.
- W3047023009 title "SVCollector: Optimized sample selection for cost-efficient long-read population sequencing" @default.
- W3047023009 cites W1912672559 @default.
- W3047023009 cites W1984360602 @default.
- W3047023009 cites W1984390786 @default.
- W3047023009 cites W1993803315 @default.
- W3047023009 cites W2002438422 @default.
- W3047023009 cites W2025738895 @default.
- W3047023009 cites W2049729688 @default.
- W3047023009 cites W2055615325 @default.
- W3047023009 cites W2082967637 @default.
- W3047023009 cites W2091200439 @default.
- W3047023009 cites W2104549677 @default.
- W3047023009 cites W2109494273 @default.
- W3047023009 cites W2143996311 @default.
- W3047023009 cites W2150550043 @default.
- W3047023009 cites W2169421412 @default.
- W3047023009 cites W2236623899 @default.
- W3047023009 cites W2303542186 @default.
- W3047023009 cites W2471249732 @default.
- W3047023009 cites W2607218014 @default.
- W3047023009 cites W2800434159 @default.
- W3047023009 cites W2950121474 @default.
- W3047023009 cites W2990344913 @default.
- W3047023009 cites W3022783334 @default.
- W3047023009 cites W3036928921 @default.
- W3047023009 cites W3209624622 @default.
- W3047023009 cites W4251304755 @default.
- W3047023009 doi "https://doi.org/10.1101/2020.08.06.240390" @default.
- W3047023009 hasPublicationYear "2020" @default.
- W3047023009 type Work @default.
- W3047023009 sameAs 3047023009 @default.
- W3047023009 citedByCount "2" @default.
- W3047023009 countsByYear W30470230092021 @default.
- W3047023009 crossrefType "posted-content" @default.
- W3047023009 hasAuthorship W3047023009A5004006041 @default.
- W3047023009 hasAuthorship W3047023009A5008484513 @default.
- W3047023009 hasAuthorship W3047023009A5012483156 @default.
- W3047023009 hasAuthorship W3047023009A5037440162 @default.
- W3047023009 hasAuthorship W3047023009A5043158509 @default.
- W3047023009 hasAuthorship W3047023009A5058192453 @default.
- W3047023009 hasAuthorship W3047023009A5073871292 @default.
- W3047023009 hasAuthorship W3047023009A5077891969 @default.
- W3047023009 hasAuthorship W3047023009A5082851461 @default.
- W3047023009 hasBestOaLocation W30470230091 @default.
- W3047023009 hasConcept C104317684 @default.
- W3047023009 hasConcept C10590036 @default.
- W3047023009 hasConcept C111919701 @default.
- W3047023009 hasConcept C11413529 @default.
- W3047023009 hasConcept C119857082 @default.
- W3047023009 hasConcept C127705205 @default.
- W3047023009 hasConcept C135763542 @default.
- W3047023009 hasConcept C141231307 @default.
- W3047023009 hasConcept C144024400 @default.
- W3047023009 hasConcept C149923435 @default.
- W3047023009 hasConcept C153209595 @default.
- W3047023009 hasConcept C16671776 @default.
- W3047023009 hasConcept C197077220 @default.
- W3047023009 hasConcept C199360897 @default.
- W3047023009 hasConcept C2908647359 @default.
- W3047023009 hasConcept C31467283 @default.
- W3047023009 hasConcept C41008148 @default.
- W3047023009 hasConcept C501734568 @default.
- W3047023009 hasConcept C51823790 @default.
- W3047023009 hasConcept C54355233 @default.
- W3047023009 hasConcept C70721500 @default.
- W3047023009 hasConcept C81917197 @default.
- W3047023009 hasConcept C86803240 @default.
- W3047023009 hasConcept C97137487 @default.
- W3047023009 hasConcept C97425143 @default.
- W3047023009 hasConceptScore W3047023009C104317684 @default.
- W3047023009 hasConceptScore W3047023009C10590036 @default.
- W3047023009 hasConceptScore W3047023009C111919701 @default.
- W3047023009 hasConceptScore W3047023009C11413529 @default.
- W3047023009 hasConceptScore W3047023009C119857082 @default.
- W3047023009 hasConceptScore W3047023009C127705205 @default.
- W3047023009 hasConceptScore W3047023009C135763542 @default.
- W3047023009 hasConceptScore W3047023009C141231307 @default.
- W3047023009 hasConceptScore W3047023009C144024400 @default.
- W3047023009 hasConceptScore W3047023009C149923435 @default.
- W3047023009 hasConceptScore W3047023009C153209595 @default.
- W3047023009 hasConceptScore W3047023009C16671776 @default.
- W3047023009 hasConceptScore W3047023009C197077220 @default.
- W3047023009 hasConceptScore W3047023009C199360897 @default.
- W3047023009 hasConceptScore W3047023009C2908647359 @default.
- W3047023009 hasConceptScore W3047023009C31467283 @default.
- W3047023009 hasConceptScore W3047023009C41008148 @default.
- W3047023009 hasConceptScore W3047023009C501734568 @default.