Matches in SemOpenAlex for { <https://semopenalex.org/work/W2211814384> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W2211814384 abstract "Comparative metagenomics aims to provide high-level information based on DNA material sequenced from different environments. The purpose is mainly to estimate proximity between two or more environmental sites at the genomic level. One way to estimate similarity is to count the number of similar DNA fragments. From a computational point of view, the problem is thus to calculate the intersections between datasets of reads. Resorting to traditional methods such as all-versus-all sequence alignment is not possible on current metagenomic projects. For instance, the Tara Oceans project involves hundreds of datasets of more than 100M reads each. Maillet et al. defined the following heuristic in their method called Commet[1]. Two reads are considered similar if they share t non-overlapping kmers (words of length k). This method is currently the fastest but still does not scale on Tara Oceans samples. To tackle this issue, we introduce a new method, called Simka[2], which computes the similarity between two datasets based on their shared kmers. To scale on large metagenomic projects, we use the GATB library[3] which provides a kmer counting tool able to count the kmers of N datasets simultaneously. Counting kmers also offers new possibilities such as filtering low frequency kmers, which potentially contain sequencing errors. Simka also provides efficiently new similarity functions. The first is based on Bray Curtis, a well-know similarity function in ecology, which informed about species abundance. The second computes the Jaccard similarity between the datasets and thus informed about presence and absence of species. Simka was tested and compared to the state of the art on 21 Tara Oceans samples. This shows that our kmerbased similarity function is very close to the read-based ones. Regarding sample proximity, different methods identify the same clusters of datasets. The fastest method of the state of the art required a few weeks to compute all the intersections whereas Simka took only 4 hours. [1] COMMET: comparing and combining multiple metagenomic datasets. N. Maillet, G. Collet, T. Vannier, D. Lavenier, P. Peterlongo. IEEE BIBM, 2014 [2] Simka: fast kmer-based method for estimating the similarity between numerous metagenomic datasets. G. Benoit, P. Peterlongo, D. Lavenier, C. Lemaitre. Hal – Inria, 2015 [3] GATB: Genome Assembly & Analysis Tool Box. E. Drezen, G. Rizk, R. Chikhi, C. Deltel, C. Lemaitre, P. Peterlongo, D. Lavenier. 10.1093/Bioinformatics/btu406, 2014" @default.
- W2211814384 created "2016-06-24" @default.
- W2211814384 creator A5012897398 @default.
- W2211814384 creator A5065622224 @default.
- W2211814384 creator A5087524785 @default.
- W2211814384 date "2015-07-13" @default.
- W2211814384 modified "2023-09-25" @default.
- W2211814384 title "Fast kmer-based method for estimating the similarity between numerous metagenomic datasets" @default.
- W2211814384 doi "https://doi.org/10.7490/f1000research.1000211.1" @default.
- W2211814384 hasPublicationYear "2015" @default.
- W2211814384 type Work @default.
- W2211814384 sameAs 2211814384 @default.
- W2211814384 citedByCount "0" @default.
- W2211814384 crossrefType "journal-article" @default.
- W2211814384 hasAuthorship W2211814384A5012897398 @default.
- W2211814384 hasAuthorship W2211814384A5065622224 @default.
- W2211814384 hasAuthorship W2211814384A5087524785 @default.
- W2211814384 hasConcept C103278499 @default.
- W2211814384 hasConcept C104317684 @default.
- W2211814384 hasConcept C115961682 @default.
- W2211814384 hasConcept C124101348 @default.
- W2211814384 hasConcept C14036430 @default.
- W2211814384 hasConcept C15151743 @default.
- W2211814384 hasConcept C153180895 @default.
- W2211814384 hasConcept C154945302 @default.
- W2211814384 hasConcept C173801870 @default.
- W2211814384 hasConcept C203519979 @default.
- W2211814384 hasConcept C205649164 @default.
- W2211814384 hasConcept C23123220 @default.
- W2211814384 hasConcept C2778112365 @default.
- W2211814384 hasConcept C2778755073 @default.
- W2211814384 hasConcept C41008148 @default.
- W2211814384 hasConcept C54355233 @default.
- W2211814384 hasConcept C55493867 @default.
- W2211814384 hasConcept C58640448 @default.
- W2211814384 hasConcept C70721500 @default.
- W2211814384 hasConcept C78458016 @default.
- W2211814384 hasConcept C86803240 @default.
- W2211814384 hasConceptScore W2211814384C103278499 @default.
- W2211814384 hasConceptScore W2211814384C104317684 @default.
- W2211814384 hasConceptScore W2211814384C115961682 @default.
- W2211814384 hasConceptScore W2211814384C124101348 @default.
- W2211814384 hasConceptScore W2211814384C14036430 @default.
- W2211814384 hasConceptScore W2211814384C15151743 @default.
- W2211814384 hasConceptScore W2211814384C153180895 @default.
- W2211814384 hasConceptScore W2211814384C154945302 @default.
- W2211814384 hasConceptScore W2211814384C173801870 @default.
- W2211814384 hasConceptScore W2211814384C203519979 @default.
- W2211814384 hasConceptScore W2211814384C205649164 @default.
- W2211814384 hasConceptScore W2211814384C23123220 @default.
- W2211814384 hasConceptScore W2211814384C2778112365 @default.
- W2211814384 hasConceptScore W2211814384C2778755073 @default.
- W2211814384 hasConceptScore W2211814384C41008148 @default.
- W2211814384 hasConceptScore W2211814384C54355233 @default.
- W2211814384 hasConceptScore W2211814384C55493867 @default.
- W2211814384 hasConceptScore W2211814384C58640448 @default.
- W2211814384 hasConceptScore W2211814384C70721500 @default.
- W2211814384 hasConceptScore W2211814384C78458016 @default.
- W2211814384 hasConceptScore W2211814384C86803240 @default.
- W2211814384 hasLocation W22118143841 @default.
- W2211814384 hasOpenAccess W2211814384 @default.
- W2211814384 hasPrimaryLocation W22118143841 @default.
- W2211814384 hasRelatedWork W2056930789 @default.
- W2211814384 hasRelatedWork W2107423521 @default.
- W2211814384 hasRelatedWork W2122555405 @default.
- W2211814384 hasRelatedWork W2125266506 @default.
- W2211814384 hasRelatedWork W2153833971 @default.
- W2211814384 hasRelatedWork W2254199494 @default.
- W2211814384 hasRelatedWork W2264626985 @default.
- W2211814384 hasRelatedWork W2337747100 @default.
- W2211814384 hasRelatedWork W2346379011 @default.
- W2211814384 hasRelatedWork W2415963323 @default.
- W2211814384 hasRelatedWork W2518374209 @default.
- W2211814384 hasRelatedWork W2892689606 @default.
- W2211814384 hasRelatedWork W2951540755 @default.
- W2211814384 hasRelatedWork W2951934242 @default.
- W2211814384 hasRelatedWork W2962840130 @default.
- W2211814384 hasRelatedWork W3097652663 @default.
- W2211814384 hasRelatedWork W3126228859 @default.
- W2211814384 hasRelatedWork W3126774196 @default.
- W2211814384 hasRelatedWork W3139286063 @default.
- W2211814384 hasRelatedWork W3157644645 @default.
- W2211814384 hasVolume "4" @default.
- W2211814384 isParatext "false" @default.
- W2211814384 isRetracted "false" @default.
- W2211814384 magId "2211814384" @default.
- W2211814384 workType "article" @default.