Matches in SemOpenAlex for { <https://semopenalex.org/work/W2126600488> ?p ?o ?g. }
- W2126600488 abstract "High-throughput molecular profiling data has been used to improve clinical decision making by stratifying subjects based on their molecular profiles. Unsupervised clustering algorithms can be used for stratification purposes. However, the current speed of the clustering algorithms cannot meet the requirement of large-scale molecular data due to poor performance of the correlation matrix calculation. With high-throughput sequencing technologies promising to produce even larger datasets per subject, we expect the performance of the state-of-the-art statistical algorithms to be further impacted unless efforts towards optimisation are carried out. MapReduce is a widely used high performance parallel framework that can solve the problem. In this paper, we evaluate the current parallel modes for correlation calculation methods and introduce an efficient data distribution and parallel calculation algorithm based on MapReduce to optimise the correlation calculation. We studied the performance of our algorithm using two gene expression benchmarks. In the micro-benchmark, our implementation using MapReduce, based on the R package RHIPE, demonstrates a 3.26-5.83 fold increase compared to the default Snowfall and 1.56-1.64 fold increase compared to the basic RHIPE in the Euclidean, Pearson and Spearman correlations. Though vanilla R and the optimised Snowfall outperforms our optimised RHIPE in the micro-benchmark, they do not scale well with the macro-benchmark. In the macro-benchmark the optimised RHIPE performs 2.03-16.56 times faster than vanilla R. Benefiting from the 3.30-5.13 times faster data preparation, the optimised RHIPE performs 1.22-1.71 times faster than the optimised Snowfall. Both the optimised RHIPE and the optimised Snowfall successfully performs the Kendall correlation with TCGA dataset within 7 hours. Both of them conduct more than 30 times faster than the estimated vanilla R. The performance evaluation found that the new MapReduce algorithm and its implementation in RHIPE outperforms vanilla R and the conventional parallel algorithms implemented in R Snowfall. We propose that MapReduce framework holds great promise for large molecular data analysis, in particular for high-dimensional genomic data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new algorithm as a basis for optimising high-throughput molecular data correlation calculation for Big Data." @default.
- W2126600488 created "2016-06-24" @default.
- W2126600488 creator A5030238684 @default.
- W2126600488 creator A5045081171 @default.
- W2126600488 creator A5048002157 @default.
- W2126600488 creator A5051201519 @default.
- W2126600488 creator A5051765148 @default.
- W2126600488 creator A5069959879 @default.
- W2126600488 creator A5079707047 @default.
- W2126600488 date "2014-11-05" @default.
- W2126600488 modified "2023-10-09" @default.
- W2126600488 title "Optimising parallel R correlation matrix calculations on gene expression data using MapReduce" @default.
- W2126600488 cites W1513026405 @default.
- W2126600488 cites W1994041743 @default.
- W2126600488 cites W2005947278 @default.
- W2126600488 cites W2021560851 @default.
- W2126600488 cites W2051685933 @default.
- W2126600488 cites W2055933691 @default.
- W2126600488 cites W2078483536 @default.
- W2126600488 cites W2095430247 @default.
- W2126600488 cites W2096283457 @default.
- W2126600488 cites W2104074461 @default.
- W2126600488 cites W2106665049 @default.
- W2126600488 cites W2116511687 @default.
- W2126600488 cites W2118526609 @default.
- W2126600488 cites W2168552795 @default.
- W2126600488 cites W2173213060 @default.
- W2126600488 cites W4211135634 @default.
- W2126600488 doi "https://doi.org/10.1186/s12859-014-0351-9" @default.
- W2126600488 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/4246436" @default.
- W2126600488 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/25371114" @default.
- W2126600488 hasPublicationYear "2014" @default.
- W2126600488 type Work @default.
- W2126600488 sameAs 2126600488 @default.
- W2126600488 citedByCount "20" @default.
- W2126600488 countsByYear W21266004882015 @default.
- W2126600488 countsByYear W21266004882016 @default.
- W2126600488 countsByYear W21266004882017 @default.
- W2126600488 countsByYear W21266004882018 @default.
- W2126600488 countsByYear W21266004882019 @default.
- W2126600488 countsByYear W21266004882020 @default.
- W2126600488 countsByYear W21266004882021 @default.
- W2126600488 countsByYear W21266004882023 @default.
- W2126600488 crossrefType "journal-article" @default.
- W2126600488 hasAuthorship W2126600488A5030238684 @default.
- W2126600488 hasAuthorship W2126600488A5045081171 @default.
- W2126600488 hasAuthorship W2126600488A5048002157 @default.
- W2126600488 hasAuthorship W2126600488A5051201519 @default.
- W2126600488 hasAuthorship W2126600488A5051765148 @default.
- W2126600488 hasAuthorship W2126600488A5069959879 @default.
- W2126600488 hasAuthorship W2126600488A5079707047 @default.
- W2126600488 hasBestOaLocation W21266004881 @default.
- W2126600488 hasConcept C105795698 @default.
- W2126600488 hasConcept C11413529 @default.
- W2126600488 hasConcept C117220453 @default.
- W2126600488 hasConcept C119857082 @default.
- W2126600488 hasConcept C124101348 @default.
- W2126600488 hasConcept C13280743 @default.
- W2126600488 hasConcept C157764524 @default.
- W2126600488 hasConcept C173608175 @default.
- W2126600488 hasConcept C185798385 @default.
- W2126600488 hasConcept C205649164 @default.
- W2126600488 hasConcept C2524010 @default.
- W2126600488 hasConcept C33923547 @default.
- W2126600488 hasConcept C41008148 @default.
- W2126600488 hasConcept C55078378 @default.
- W2126600488 hasConcept C555944384 @default.
- W2126600488 hasConcept C73555534 @default.
- W2126600488 hasConcept C76155785 @default.
- W2126600488 hasConceptScore W2126600488C105795698 @default.
- W2126600488 hasConceptScore W2126600488C11413529 @default.
- W2126600488 hasConceptScore W2126600488C117220453 @default.
- W2126600488 hasConceptScore W2126600488C119857082 @default.
- W2126600488 hasConceptScore W2126600488C124101348 @default.
- W2126600488 hasConceptScore W2126600488C13280743 @default.
- W2126600488 hasConceptScore W2126600488C157764524 @default.
- W2126600488 hasConceptScore W2126600488C173608175 @default.
- W2126600488 hasConceptScore W2126600488C185798385 @default.
- W2126600488 hasConceptScore W2126600488C205649164 @default.
- W2126600488 hasConceptScore W2126600488C2524010 @default.
- W2126600488 hasConceptScore W2126600488C33923547 @default.
- W2126600488 hasConceptScore W2126600488C41008148 @default.
- W2126600488 hasConceptScore W2126600488C55078378 @default.
- W2126600488 hasConceptScore W2126600488C555944384 @default.
- W2126600488 hasConceptScore W2126600488C73555534 @default.
- W2126600488 hasConceptScore W2126600488C76155785 @default.
- W2126600488 hasIssue "1" @default.
- W2126600488 hasLocation W21266004881 @default.
- W2126600488 hasLocation W21266004882 @default.
- W2126600488 hasLocation W21266004883 @default.
- W2126600488 hasLocation W21266004884 @default.
- W2126600488 hasLocation W21266004885 @default.
- W2126600488 hasOpenAccess W2126600488 @default.
- W2126600488 hasPrimaryLocation W21266004881 @default.
- W2126600488 hasRelatedWork W1485630101 @default.
- W2126600488 hasRelatedWork W2021762492 @default.
- W2126600488 hasRelatedWork W2047588290 @default.
- W2126600488 hasRelatedWork W2070338563 @default.
- W2126600488 hasRelatedWork W2093683727 @default.
- W2126600488 hasRelatedWork W2391655055 @default.