Matches in SemOpenAlex for { <https://semopenalex.org/work/W2949293135> ?p ?o ?g. }
- W2949293135 abstract "Distributed approaches based on the map-reduce programming paradigm have started to be proposed in the bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of map-reduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the software with respect to the specific framework under consideration may be crucial in order to achieve good performance, especially on very large amounts of data. We choose k-mers counting as a case study for our analysis, and Spark as the framework to implement FastKmer, a novel approach for the extraction of k-mer statistics from large collection of biological sequences, with arbitrary values of k. One of the most relevant contributions of FastKmer is the introduction of a module for balancing the statistics aggregation workload over the nodes of a computing cluster, in order to overcome data skew while allowing for a fully exploitation of the underly- ing distributed architecture. We also present the results of a comparative experimental analysis showing that our approach is currently the fastest among the ones based on Big Data technologies, while exhibiting a very good scalability. We provide evidence that the usage of technologies such as Hadoop or Spark for the analysis of big datasets of biological sequences is productive only if the architectural details and the peculiar aspects of the considered framework are carefully taken into account for the algorithm design and implementation." @default.
- W2949293135 created "2019-06-27" @default.
- W2949293135 creator A5018051509 @default.
- W2949293135 creator A5032279305 @default.
- W2949293135 creator A5047319643 @default.
- W2949293135 creator A5066774697 @default.
- W2949293135 creator A5078881318 @default.
- W2949293135 date "2018-07-04" @default.
- W2949293135 modified "2023-09-27" @default.
- W2949293135 title "Analyzing Big Datasets of Genomic Sequences: Fast and Scalable Collection of k-mer Statistics" @default.
- W2949293135 cites W1510543252 @default.
- W2949293135 cites W1570906644 @default.
- W2949293135 cites W1829813581 @default.
- W2949293135 cites W1891617912 @default.
- W2949293135 cites W1969346416 @default.
- W2949293135 cites W2010473599 @default.
- W2949293135 cites W2034337154 @default.
- W2949293135 cites W2041391522 @default.
- W2949293135 cites W2044805330 @default.
- W2949293135 cites W2098935637 @default.
- W2949293135 cites W2104680817 @default.
- W2949293135 cites W2105947650 @default.
- W2949293135 cites W2111307596 @default.
- W2949293135 cites W2117608012 @default.
- W2949293135 cites W2124626190 @default.
- W2949293135 cites W2125266506 @default.
- W2949293135 cites W2134814259 @default.
- W2949293135 cites W2136290019 @default.
- W2949293135 cites W2138486754 @default.
- W2949293135 cites W2144560237 @default.
- W2949293135 cites W2148043549 @default.
- W2949293135 cites W2149059931 @default.
- W2949293135 cites W2173213060 @default.
- W2949293135 cites W2173874602 @default.
- W2949293135 cites W2189371416 @default.
- W2949293135 cites W2339602899 @default.
- W2949293135 cites W2499030501 @default.
- W2949293135 cites W2568237575 @default.
- W2949293135 cites W2583363792 @default.
- W2949293135 cites W2607486241 @default.
- W2949293135 hasPublicationYear "2018" @default.
- W2949293135 type Work @default.
- W2949293135 sameAs 2949293135 @default.
- W2949293135 citedByCount "0" @default.
- W2949293135 crossrefType "posted-content" @default.
- W2949293135 hasAuthorship W2949293135A5018051509 @default.
- W2949293135 hasAuthorship W2949293135A5032279305 @default.
- W2949293135 hasAuthorship W2949293135A5047319643 @default.
- W2949293135 hasAuthorship W2949293135A5066774697 @default.
- W2949293135 hasAuthorship W2949293135A5078881318 @default.
- W2949293135 hasConcept C111919701 @default.
- W2949293135 hasConcept C124101348 @default.
- W2949293135 hasConcept C134306372 @default.
- W2949293135 hasConcept C199360897 @default.
- W2949293135 hasConcept C2522767166 @default.
- W2949293135 hasConcept C2777904410 @default.
- W2949293135 hasConcept C2778476105 @default.
- W2949293135 hasConcept C2781215313 @default.
- W2949293135 hasConcept C33923547 @default.
- W2949293135 hasConcept C36503486 @default.
- W2949293135 hasConcept C41008148 @default.
- W2949293135 hasConcept C43711488 @default.
- W2949293135 hasConcept C48044578 @default.
- W2949293135 hasConcept C75684735 @default.
- W2949293135 hasConcept C76155785 @default.
- W2949293135 hasConcept C77088390 @default.
- W2949293135 hasConceptScore W2949293135C111919701 @default.
- W2949293135 hasConceptScore W2949293135C124101348 @default.
- W2949293135 hasConceptScore W2949293135C134306372 @default.
- W2949293135 hasConceptScore W2949293135C199360897 @default.
- W2949293135 hasConceptScore W2949293135C2522767166 @default.
- W2949293135 hasConceptScore W2949293135C2777904410 @default.
- W2949293135 hasConceptScore W2949293135C2778476105 @default.
- W2949293135 hasConceptScore W2949293135C2781215313 @default.
- W2949293135 hasConceptScore W2949293135C33923547 @default.
- W2949293135 hasConceptScore W2949293135C36503486 @default.
- W2949293135 hasConceptScore W2949293135C41008148 @default.
- W2949293135 hasConceptScore W2949293135C43711488 @default.
- W2949293135 hasConceptScore W2949293135C48044578 @default.
- W2949293135 hasConceptScore W2949293135C75684735 @default.
- W2949293135 hasConceptScore W2949293135C76155785 @default.
- W2949293135 hasConceptScore W2949293135C77088390 @default.
- W2949293135 hasLocation W29492931351 @default.
- W2949293135 hasOpenAccess W2949293135 @default.
- W2949293135 hasPrimaryLocation W29492931351 @default.
- W2949293135 hasRelatedWork W1512615396 @default.
- W2949293135 hasRelatedWork W1971155008 @default.
- W2949293135 hasRelatedWork W2102294813 @default.
- W2949293135 hasRelatedWork W2233468137 @default.
- W2949293135 hasRelatedWork W2285253616 @default.
- W2949293135 hasRelatedWork W2532095873 @default.
- W2949293135 hasRelatedWork W2584045454 @default.
- W2949293135 hasRelatedWork W2607929867 @default.
- W2949293135 hasRelatedWork W2742208603 @default.
- W2949293135 hasRelatedWork W2766461277 @default.
- W2949293135 hasRelatedWork W2885988341 @default.
- W2949293135 hasRelatedWork W2949426525 @default.
- W2949293135 hasRelatedWork W2995218600 @default.
- W2949293135 hasRelatedWork W3012264682 @default.
- W2949293135 hasRelatedWork W3093765838 @default.