Matches in SemOpenAlex for { <https://semopenalex.org/work/W2895830560> ?p ?o ?g. }
- W2895830560 abstract "ABSTRACT Next-generation sequencing technology (NGS) enables discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in sequencing technology or in variant calling algorithms. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present a statistical approach for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our method uses information on sequencing quality such as sequencing depth, genotyping quality, and GC contents to predict whether a certain variant is likely to contain errors. To evaluate our method, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that our method outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. Our approach is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is an effective approach to perform quality control on genetic variants from sequencing data. Author Summary Genetic disorders can be caused by many types of genetic mutations, including common and rare single nucleotide variants, structural variants, insertions and deletions. Nowadays, next generation sequencing (NGS) technology allows us to identify various genetic variants that are associated with diseases. However, variants detected by NGS might have poor sequencing quality due to biases and errors in sequencing technologies and analysis tools. Therefore, it is critical to remove variants with low quality, which could cause spurious findings in follow-up analyses. Previously, people applied either hard filters or machine learning models for variant quality control (QC), which failed to filter out those variants accurately. Here, we developed a statistical tool, ForestQC, for variant QC by combining a filtering approach and a machine learning approach. We applied ForestQC to one family-based whole genome sequencing (WGS) dataset and one general case-control WGS dataset, to evaluate our method. Results show that ForestQC outperforms widely used methods for variant QC by considerably improving the quality of variants. Also, ForestQC is very efficient and scalable to large-scale sequencing datasets. Our study indicates that combining filtering approaches and machine learning approaches enables effective variant QC." @default.
- W2895830560 created "2018-10-26" @default.
- W2895830560 creator A5004736211 @default.
- W2895830560 creator A5007441251 @default.
- W2895830560 creator A5025278422 @default.
- W2895830560 creator A5026769077 @default.
- W2895830560 creator A5053471653 @default.
- W2895830560 creator A5062316104 @default.
- W2895830560 creator A5086531000 @default.
- W2895830560 date "2018-10-16" @default.
- W2895830560 modified "2023-10-16" @default.
- W2895830560 title "ForestQC: quality control on genetic variants from next-generation sequencing data using random forest" @default.
- W2895830560 cites W1544712555 @default.
- W2895830560 cites W1882576502 @default.
- W2895830560 cites W1912672559 @default.
- W2895830560 cites W1963666037 @default.
- W2895830560 cites W1965092590 @default.
- W2895830560 cites W1967824604 @default.
- W2895830560 cites W1971584645 @default.
- W2895830560 cites W1987493079 @default.
- W2895830560 cites W2015759847 @default.
- W2895830560 cites W2023497813 @default.
- W2895830560 cites W2031296193 @default.
- W2895830560 cites W2032415634 @default.
- W2895830560 cites W2033173420 @default.
- W2895830560 cites W2033288453 @default.
- W2895830560 cites W2044539313 @default.
- W2895830560 cites W2053218599 @default.
- W2895830560 cites W2053725906 @default.
- W2895830560 cites W2056104146 @default.
- W2895830560 cites W2058401000 @default.
- W2895830560 cites W2087464200 @default.
- W2895830560 cites W2091583677 @default.
- W2895830560 cites W2097804771 @default.
- W2895830560 cites W2101357408 @default.
- W2895830560 cites W2104549677 @default.
- W2895830560 cites W2104595023 @default.
- W2895830560 cites W2113547573 @default.
- W2895830560 cites W2114847949 @default.
- W2895830560 cites W2119180969 @default.
- W2895830560 cites W2124465358 @default.
- W2895830560 cites W2127003470 @default.
- W2895830560 cites W2129559300 @default.
- W2895830560 cites W2131088968 @default.
- W2895830560 cites W2131187246 @default.
- W2895830560 cites W2134638008 @default.
- W2895830560 cites W2141459724 @default.
- W2895830560 cites W2143992683 @default.
- W2895830560 cites W2145406188 @default.
- W2895830560 cites W2147733682 @default.
- W2895830560 cites W2149992227 @default.
- W2895830560 cites W2157057843 @default.
- W2895830560 cites W2160444236 @default.
- W2895830560 cites W2163924952 @default.
- W2895830560 cites W2168133698 @default.
- W2895830560 cites W2171777347 @default.
- W2895830560 cites W2295828257 @default.
- W2895830560 cites W2417778132 @default.
- W2895830560 cites W2523787414 @default.
- W2895830560 cites W2779975541 @default.
- W2895830560 cites W2855142678 @default.
- W2895830560 cites W2911964244 @default.
- W2895830560 cites W2951529834 @default.
- W2895830560 doi "https://doi.org/10.1101/444828" @default.
- W2895830560 hasPublicationYear "2018" @default.
- W2895830560 type Work @default.
- W2895830560 sameAs 2895830560 @default.
- W2895830560 citedByCount "3" @default.
- W2895830560 countsByYear W28958305602018 @default.
- W2895830560 countsByYear W28958305602019 @default.
- W2895830560 countsByYear W28958305602020 @default.
- W2895830560 crossrefType "posted-content" @default.
- W2895830560 hasAuthorship W2895830560A5004736211 @default.
- W2895830560 hasAuthorship W2895830560A5007441251 @default.
- W2895830560 hasAuthorship W2895830560A5025278422 @default.
- W2895830560 hasAuthorship W2895830560A5026769077 @default.
- W2895830560 hasAuthorship W2895830560A5053471653 @default.
- W2895830560 hasAuthorship W2895830560A5062316104 @default.
- W2895830560 hasAuthorship W2895830560A5086531000 @default.
- W2895830560 hasBestOaLocation W28958305601 @default.
- W2895830560 hasConcept C104317684 @default.
- W2895830560 hasConcept C111472728 @default.
- W2895830560 hasConcept C124101348 @default.
- W2895830560 hasConcept C135763542 @default.
- W2895830560 hasConcept C138885662 @default.
- W2895830560 hasConcept C162324750 @default.
- W2895830560 hasConcept C16671776 @default.
- W2895830560 hasConcept C176217482 @default.
- W2895830560 hasConcept C21547014 @default.
- W2895830560 hasConcept C2779346075 @default.
- W2895830560 hasConcept C2779530757 @default.
- W2895830560 hasConcept C2993967602 @default.
- W2895830560 hasConcept C31467283 @default.
- W2895830560 hasConcept C41008148 @default.
- W2895830560 hasConcept C501734568 @default.
- W2895830560 hasConcept C51679486 @default.
- W2895830560 hasConcept C54355233 @default.
- W2895830560 hasConcept C70721500 @default.
- W2895830560 hasConcept C86803240 @default.
- W2895830560 hasConceptScore W2895830560C104317684 @default.