Matches in SemOpenAlex for { <https://semopenalex.org/work/W4253194480> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4253194480 abstract "Abstract Background . Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon-intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. Results . We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. Conclusions . Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon-intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction." @default.
- W4253194480 created "2022-05-12" @default.
- W4253194480 creator A5024826286 @default.
- W4253194480 creator A5025255544 @default.
- W4253194480 creator A5035018837 @default.
- W4253194480 creator A5039921905 @default.
- W4253194480 creator A5057426644 @default.
- W4253194480 creator A5076376874 @default.
- W4253194480 date "2020-08-05" @default.
- W4253194480 modified "2023-09-28" @default.
- W4253194480 title "Understanding the Causes of Errors in Eukaryotic Protein-coding Gene Prediction: A Case Study of Primate Proteomes" @default.
- W4253194480 doi "https://doi.org/10.21203/rs.3.rs-50810/v1" @default.
- W4253194480 hasPublicationYear "2020" @default.
- W4253194480 type Work @default.
- W4253194480 citedByCount "0" @default.
- W4253194480 crossrefType "posted-content" @default.
- W4253194480 hasAuthorship W4253194480A5024826286 @default.
- W4253194480 hasAuthorship W4253194480A5025255544 @default.
- W4253194480 hasAuthorship W4253194480A5035018837 @default.
- W4253194480 hasAuthorship W4253194480A5039921905 @default.
- W4253194480 hasAuthorship W4253194480A5057426644 @default.
- W4253194480 hasAuthorship W4253194480A5076376874 @default.
- W4253194480 hasBestOaLocation W42531944801 @default.
- W4253194480 hasConcept C104317684 @default.
- W4253194480 hasConcept C104397665 @default.
- W4253194480 hasConcept C105565629 @default.
- W4253194480 hasConcept C141231307 @default.
- W4253194480 hasConcept C197077220 @default.
- W4253194480 hasConcept C36823959 @default.
- W4253194480 hasConcept C54355233 @default.
- W4253194480 hasConcept C70721500 @default.
- W4253194480 hasConcept C86803240 @default.
- W4253194480 hasConcept C91779695 @default.
- W4253194480 hasConcept C94671646 @default.
- W4253194480 hasConceptScore W4253194480C104317684 @default.
- W4253194480 hasConceptScore W4253194480C104397665 @default.
- W4253194480 hasConceptScore W4253194480C105565629 @default.
- W4253194480 hasConceptScore W4253194480C141231307 @default.
- W4253194480 hasConceptScore W4253194480C197077220 @default.
- W4253194480 hasConceptScore W4253194480C36823959 @default.
- W4253194480 hasConceptScore W4253194480C54355233 @default.
- W4253194480 hasConceptScore W4253194480C70721500 @default.
- W4253194480 hasConceptScore W4253194480C86803240 @default.
- W4253194480 hasConceptScore W4253194480C91779695 @default.
- W4253194480 hasConceptScore W4253194480C94671646 @default.
- W4253194480 hasLocation W42531944801 @default.
- W4253194480 hasLocation W42531944802 @default.
- W4253194480 hasLocation W42531944803 @default.
- W4253194480 hasOpenAccess W4253194480 @default.
- W4253194480 hasPrimaryLocation W42531944801 @default.
- W4253194480 hasRelatedWork W1999232590 @default.
- W4253194480 hasRelatedWork W2039330132 @default.
- W4253194480 hasRelatedWork W2085975240 @default.
- W4253194480 hasRelatedWork W2090999728 @default.
- W4253194480 hasRelatedWork W2116688916 @default.
- W4253194480 hasRelatedWork W2131379448 @default.
- W4253194480 hasRelatedWork W2139592023 @default.
- W4253194480 hasRelatedWork W2418488235 @default.
- W4253194480 hasRelatedWork W2549117626 @default.
- W4253194480 hasRelatedWork W63888384 @default.
- W4253194480 isParatext "false" @default.
- W4253194480 isRetracted "false" @default.
- W4253194480 workType "article" @default.