Matches in SemOpenAlex for { <https://semopenalex.org/work/W3100102095> ?p ?o ?g. }
- W3100102095 abstract "Abstract Background Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon–intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. Results We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. Conclusions Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon–intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction." @default.
- W3100102095 created "2020-11-23" @default.
- W3100102095 creator A5024826286 @default.
- W3100102095 creator A5025255544 @default.
- W3100102095 creator A5039921905 @default.
- W3100102095 creator A5057426644 @default.
- W3100102095 creator A5076376874 @default.
- W3100102095 creator A5091010060 @default.
- W3100102095 date "2020-11-10" @default.
- W3100102095 modified "2023-10-06" @default.
- W3100102095 title "Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes" @default.
- W3100102095 cites W1841334989 @default.
- W3100102095 cites W1864675704 @default.
- W3100102095 cites W2030607567 @default.
- W3100102095 cites W2056524415 @default.
- W3100102095 cites W2060797027 @default.
- W3100102095 cites W2111797812 @default.
- W3100102095 cites W2129620046 @default.
- W3100102095 cites W2130395351 @default.
- W3100102095 cites W2136126520 @default.
- W3100102095 cites W2142678478 @default.
- W3100102095 cites W2167142288 @default.
- W3100102095 cites W2173732482 @default.
- W3100102095 cites W2178351989 @default.
- W3100102095 cites W2311051041 @default.
- W3100102095 cites W2463304596 @default.
- W3100102095 cites W2538372025 @default.
- W3100102095 cites W2565541933 @default.
- W3100102095 cites W2766001711 @default.
- W3100102095 cites W2792034759 @default.
- W3100102095 cites W2805909216 @default.
- W3100102095 cites W2899354740 @default.
- W3100102095 cites W2900629010 @default.
- W3100102095 cites W2903241977 @default.
- W3100102095 cites W2927961455 @default.
- W3100102095 cites W2947085902 @default.
- W3100102095 cites W2947780575 @default.
- W3100102095 cites W2949891274 @default.
- W3100102095 cites W2950859273 @default.
- W3100102095 cites W2965072475 @default.
- W3100102095 cites W2971299355 @default.
- W3100102095 cites W2977428858 @default.
- W3100102095 cites W2980539212 @default.
- W3100102095 cites W2988396470 @default.
- W3100102095 cites W2996758863 @default.
- W3100102095 cites W3015796860 @default.
- W3100102095 cites W3136918052 @default.
- W3100102095 doi "https://doi.org/10.1186/s12859-020-03855-1" @default.
- W3100102095 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/7656754" @default.
- W3100102095 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/33172385" @default.
- W3100102095 hasPublicationYear "2020" @default.
- W3100102095 type Work @default.
- W3100102095 sameAs 3100102095 @default.
- W3100102095 citedByCount "15" @default.
- W3100102095 countsByYear W31001020952021 @default.
- W3100102095 countsByYear W31001020952022 @default.
- W3100102095 countsByYear W31001020952023 @default.
- W3100102095 crossrefType "journal-article" @default.
- W3100102095 hasAuthorship W3100102095A5024826286 @default.
- W3100102095 hasAuthorship W3100102095A5025255544 @default.
- W3100102095 hasAuthorship W3100102095A5039921905 @default.
- W3100102095 hasAuthorship W3100102095A5057426644 @default.
- W3100102095 hasAuthorship W3100102095A5076376874 @default.
- W3100102095 hasAuthorship W3100102095A5091010060 @default.
- W3100102095 hasBestOaLocation W31001020951 @default.
- W3100102095 hasConcept C104317684 @default.
- W3100102095 hasConcept C104397665 @default.
- W3100102095 hasConcept C105565629 @default.
- W3100102095 hasConcept C141231307 @default.
- W3100102095 hasConcept C150194340 @default.
- W3100102095 hasConcept C197077220 @default.
- W3100102095 hasConcept C36823959 @default.
- W3100102095 hasConcept C54355233 @default.
- W3100102095 hasConcept C70721500 @default.
- W3100102095 hasConcept C86803240 @default.
- W3100102095 hasConcept C89566754 @default.
- W3100102095 hasConcept C91779695 @default.
- W3100102095 hasConcept C94671646 @default.
- W3100102095 hasConcept C95371953 @default.
- W3100102095 hasConceptScore W3100102095C104317684 @default.
- W3100102095 hasConceptScore W3100102095C104397665 @default.
- W3100102095 hasConceptScore W3100102095C105565629 @default.
- W3100102095 hasConceptScore W3100102095C141231307 @default.
- W3100102095 hasConceptScore W3100102095C150194340 @default.
- W3100102095 hasConceptScore W3100102095C197077220 @default.
- W3100102095 hasConceptScore W3100102095C36823959 @default.
- W3100102095 hasConceptScore W3100102095C54355233 @default.
- W3100102095 hasConceptScore W3100102095C70721500 @default.
- W3100102095 hasConceptScore W3100102095C86803240 @default.
- W3100102095 hasConceptScore W3100102095C89566754 @default.
- W3100102095 hasConceptScore W3100102095C91779695 @default.
- W3100102095 hasConceptScore W3100102095C94671646 @default.
- W3100102095 hasConceptScore W3100102095C95371953 @default.
- W3100102095 hasFunder F4320320883 @default.
- W3100102095 hasIssue "1" @default.
- W3100102095 hasLocation W31001020951 @default.
- W3100102095 hasLocation W31001020952 @default.
- W3100102095 hasLocation W31001020953 @default.
- W3100102095 hasLocation W31001020954 @default.
- W3100102095 hasLocation W31001020955 @default.