Matches in SemOpenAlex for { <https://semopenalex.org/work/W2948391508> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W2948391508 abstract "As the cost of DNA sequencing decreases, the high throughput sequencing technologies become more and more accessible to many laboratories. Consequently, new issues emerge that require new algorithms including tools for indexing and compressing thousands of genomes, as for example the 3000 rice genomes project [1], for which we are particularly interested in. Genomes can be considered as very large texts on a simple alphabet ∑ = {A, C, G, T }, We can refer to indexable dictionary problem which consists in storing a set ⊆ {0, . . . , i, . . . , m− 1} of an universe U = n. B(n) where B[i] = 1 () i ∈ S. The indexable dictionary problem support two additionnal operations ranks(i) and selects(i) for s ∈ {0, 1}. The function ranks(i) returns the number of elements (s) up to i and selects(i) returns the position of the ith occurence of s. The indexation of complete genomes is an important stage in the exploration and understanding of data from living organisms. An efficient index should provide a quick answer to the following questions. -How many times a given pattern does appear in the genome? - What are the positions of a given pattern? -What is the pattern length at the ith position in the genome? The common way to structure index and compress one genome is to use the Burrows-Wheeler Transform –BWT)[2] with the FM-index [3] on BWT sequences for requests. If you want to index several genomes with one reference genome you may use MuGI [4]. To build MuGI index they store the reference in compact form (4 bits to encode single char), a variant database, one bit vector for each variant and an array kMA keeping information about each k-mers. This is a really interesting approach but it needs to have a reference genom. We present a structure which proposes a solution to index and compress very repetitive sequences over small alphabet in texts using k-mers. k-mers are factors of length k in the considered sequences. We built a 4k1 array, where k1 < k, and each entry, namely an array, is indexed by a prefix of size k1 of existing k-mers. In each prefix array we insert a 4k2 bit vector which represents all possible k-mers begining with the considered prefix. We will use libGkArray [5] to query a large read collections and update our structure. We chose libGkArray instead of JellyFish [6] and KMC (any versions) [7] in main memory. To build the index, we cut our genomes into k-mers, for each k-mer we split the k-mer into prefix suffix of respective size k1 and k2. We call the function kmer _ to _int() which takes a k-mer and returns its integer value. We then go into the prefix array PA[kmer to int(k1)] and we add k2 to our suyx array. We also add a 1 in the succint structure to Gi i ϵ n with n the number of genomes as you can see at Fig.1. Given a n for the number of genomes and N for all k-mers in the genome set, we can estimate the time and space complexity as respectively O(N log(n)) and O(N × 2k2 log(n + N )). Our structure has to be eycient in memory space and comuting time." @default.
- W2948391508 created "2019-06-14" @default.
- W2948391508 creator A5001451722 @default.
- W2948391508 creator A5002675973 @default.
- W2948391508 creator A5044357238 @default.
- W2948391508 creator A5070180290 @default.
- W2948391508 creator A5077668114 @default.
- W2948391508 date "2017-01-01" @default.
- W2948391508 modified "2023-09-24" @default.
- W2948391508 title "Development of indexing compressed structure for analyzing a collection of similar genomes: application to rice" @default.
- W2948391508 hasPublicationYear "2017" @default.
- W2948391508 type Work @default.
- W2948391508 sameAs 2948391508 @default.
- W2948391508 citedByCount "0" @default.
- W2948391508 crossrefType "journal-article" @default.
- W2948391508 hasAuthorship W2948391508A5001451722 @default.
- W2948391508 hasAuthorship W2948391508A5002675973 @default.
- W2948391508 hasAuthorship W2948391508A5044357238 @default.
- W2948391508 hasAuthorship W2948391508A5070180290 @default.
- W2948391508 hasAuthorship W2948391508A5077668114 @default.
- W2948391508 hasConcept C10138342 @default.
- W2948391508 hasConcept C104317684 @default.
- W2948391508 hasConcept C124101348 @default.
- W2948391508 hasConcept C136764020 @default.
- W2948391508 hasConcept C14036430 @default.
- W2948391508 hasConcept C141231307 @default.
- W2948391508 hasConcept C162324750 @default.
- W2948391508 hasConcept C177264268 @default.
- W2948391508 hasConcept C192953774 @default.
- W2948391508 hasConcept C198082294 @default.
- W2948391508 hasConcept C199360897 @default.
- W2948391508 hasConcept C23123220 @default.
- W2948391508 hasConcept C2777382242 @default.
- W2948391508 hasConcept C41008148 @default.
- W2948391508 hasConcept C51679486 @default.
- W2948391508 hasConcept C54355233 @default.
- W2948391508 hasConcept C552990157 @default.
- W2948391508 hasConcept C75165309 @default.
- W2948391508 hasConcept C80444323 @default.
- W2948391508 hasConcept C86803240 @default.
- W2948391508 hasConceptScore W2948391508C10138342 @default.
- W2948391508 hasConceptScore W2948391508C104317684 @default.
- W2948391508 hasConceptScore W2948391508C124101348 @default.
- W2948391508 hasConceptScore W2948391508C136764020 @default.
- W2948391508 hasConceptScore W2948391508C14036430 @default.
- W2948391508 hasConceptScore W2948391508C141231307 @default.
- W2948391508 hasConceptScore W2948391508C162324750 @default.
- W2948391508 hasConceptScore W2948391508C177264268 @default.
- W2948391508 hasConceptScore W2948391508C192953774 @default.
- W2948391508 hasConceptScore W2948391508C198082294 @default.
- W2948391508 hasConceptScore W2948391508C199360897 @default.
- W2948391508 hasConceptScore W2948391508C23123220 @default.
- W2948391508 hasConceptScore W2948391508C2777382242 @default.
- W2948391508 hasConceptScore W2948391508C41008148 @default.
- W2948391508 hasConceptScore W2948391508C51679486 @default.
- W2948391508 hasConceptScore W2948391508C54355233 @default.
- W2948391508 hasConceptScore W2948391508C552990157 @default.
- W2948391508 hasConceptScore W2948391508C75165309 @default.
- W2948391508 hasConceptScore W2948391508C80444323 @default.
- W2948391508 hasConceptScore W2948391508C86803240 @default.
- W2948391508 hasLocation W29483915081 @default.
- W2948391508 hasOpenAccess W2948391508 @default.
- W2948391508 hasPrimaryLocation W29483915081 @default.
- W2948391508 hasRelatedWork W1789533689 @default.
- W2948391508 hasRelatedWork W1938985178 @default.
- W2948391508 hasRelatedWork W1970825859 @default.
- W2948391508 hasRelatedWork W2107043533 @default.
- W2948391508 hasRelatedWork W2108278098 @default.
- W2948391508 hasRelatedWork W2161938817 @default.
- W2948391508 hasRelatedWork W2238365835 @default.
- W2948391508 hasRelatedWork W2408868331 @default.
- W2948391508 hasRelatedWork W2487384794 @default.
- W2948391508 hasRelatedWork W2602978558 @default.
- W2948391508 hasRelatedWork W2886951692 @default.
- W2948391508 hasRelatedWork W2943219835 @default.
- W2948391508 hasRelatedWork W2949360255 @default.
- W2948391508 hasRelatedWork W2953622720 @default.
- W2948391508 hasRelatedWork W2955966015 @default.
- W2948391508 hasRelatedWork W3128964673 @default.
- W2948391508 hasRelatedWork W3131616027 @default.
- W2948391508 hasRelatedWork W616987689 @default.
- W2948391508 hasRelatedWork W76210463 @default.
- W2948391508 hasRelatedWork W820014006 @default.
- W2948391508 isParatext "false" @default.
- W2948391508 isRetracted "false" @default.
- W2948391508 magId "2948391508" @default.
- W2948391508 workType "article" @default.