Matches in SemOpenAlex for { <https://semopenalex.org/work/W4384131029> ?p ?o ?g. }
- W4384131029 endingPage "e1180" @default.
- W4384131029 startingPage "e1180" @default.
- W4384131029 abstract "Background The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge. Method The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article’s primary aim and contribution are to support the researchers through an extensive review to ease other researchers’ search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization. Results Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach. Conclusion We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance." @default.
- W4384131029 created "2023-07-14" @default.
- W4384131029 creator A5006438449 @default.
- W4384131029 creator A5036118680 @default.
- W4384131029 creator A5052423409 @default.
- W4384131029 creator A5075574029 @default.
- W4384131029 creator A5090577706 @default.
- W4384131029 date "2023-07-13" @default.
- W4384131029 modified "2023-09-26" @default.
- W4384131029 title "Genome assembly composition of the String “ACGT” array: a review of data structure accuracy and performance challenges" @default.
- W4384131029 cites W1505191356 @default.
- W4384131029 cites W1957444659 @default.
- W4384131029 cites W1978732934 @default.
- W4384131029 cites W1999030333 @default.
- W4384131029 cites W2015750881 @default.
- W4384131029 cites W2017120726 @default.
- W4384131029 cites W2028240922 @default.
- W4384131029 cites W2035543796 @default.
- W4384131029 cites W2047168584 @default.
- W4384131029 cites W2108640362 @default.
- W4384131029 cites W2120015325 @default.
- W4384131029 cites W2148072340 @default.
- W4384131029 cites W2174696350 @default.
- W4384131029 cites W2259812802 @default.
- W4384131029 cites W2300605489 @default.
- W4384131029 cites W2400167572 @default.
- W4384131029 cites W2511260145 @default.
- W4384131029 cites W2588090883 @default.
- W4384131029 cites W2592783371 @default.
- W4384131029 cites W2604585222 @default.
- W4384131029 cites W2604604482 @default.
- W4384131029 cites W2606325896 @default.
- W4384131029 cites W2606519017 @default.
- W4384131029 cites W2751464430 @default.
- W4384131029 cites W2761301729 @default.
- W4384131029 cites W2766771288 @default.
- W4384131029 cites W2767590279 @default.
- W4384131029 cites W2767874009 @default.
- W4384131029 cites W2776573029 @default.
- W4384131029 cites W2781865485 @default.
- W4384131029 cites W2781939187 @default.
- W4384131029 cites W2783825348 @default.
- W4384131029 cites W2783973318 @default.
- W4384131029 cites W2790220607 @default.
- W4384131029 cites W2792005838 @default.
- W4384131029 cites W2794893153 @default.
- W4384131029 cites W2797601685 @default.
- W4384131029 cites W2803011486 @default.
- W4384131029 cites W2803140815 @default.
- W4384131029 cites W2808352813 @default.
- W4384131029 cites W2896324799 @default.
- W4384131029 cites W2899544675 @default.
- W4384131029 cites W2950121380 @default.
- W4384131029 cites W2962686126 @default.
- W4384131029 cites W2967439527 @default.
- W4384131029 cites W2993246561 @default.
- W4384131029 cites W3000676906 @default.
- W4384131029 cites W3006459350 @default.
- W4384131029 cites W3012376967 @default.
- W4384131029 cites W3021001696 @default.
- W4384131029 cites W3022783334 @default.
- W4384131029 cites W3085526196 @default.
- W4384131029 cites W3085758516 @default.
- W4384131029 cites W3093613458 @default.
- W4384131029 cites W3118862598 @default.
- W4384131029 cites W3119220392 @default.
- W4384131029 cites W3120712761 @default.
- W4384131029 cites W3125993919 @default.
- W4384131029 cites W3135903892 @default.
- W4384131029 cites W3171794097 @default.
- W4384131029 cites W3174489946 @default.
- W4384131029 cites W3213656100 @default.
- W4384131029 cites W4239243392 @default.
- W4384131029 cites W4252729214 @default.
- W4384131029 cites W4311025226 @default.
- W4384131029 doi "https://doi.org/10.7717/peerj-cs.1180" @default.
- W4384131029 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37547391" @default.
- W4384131029 hasPublicationYear "2023" @default.
- W4384131029 type Work @default.
- W4384131029 citedByCount "0" @default.
- W4384131029 crossrefType "journal-article" @default.
- W4384131029 hasAuthorship W4384131029A5006438449 @default.
- W4384131029 hasAuthorship W4384131029A5036118680 @default.
- W4384131029 hasAuthorship W4384131029A5052423409 @default.
- W4384131029 hasAuthorship W4384131029A5075574029 @default.
- W4384131029 hasAuthorship W4384131029A5090577706 @default.
- W4384131029 hasBestOaLocation W43841310291 @default.
- W4384131029 hasConcept C104317684 @default.
- W4384131029 hasConcept C113425843 @default.
- W4384131029 hasConcept C11413529 @default.
- W4384131029 hasConcept C116834253 @default.
- W4384131029 hasConcept C124101348 @default.
- W4384131029 hasConcept C141231307 @default.
- W4384131029 hasConcept C150194340 @default.
- W4384131029 hasConcept C157486923 @default.
- W4384131029 hasConcept C162317418 @default.
- W4384131029 hasConcept C18949551 @default.
- W4384131029 hasConcept C192953774 @default.