Matches in SemOpenAlex for { <https://semopenalex.org/work/W817491301> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W817491301 abstract "Many data-driven applications perform computations on large volumes of data that do not fit on a single computer. These applications typically must use parallel shared-nothing distributed software systems to perform their computations. This thesis addresses challenges in large-scale distributed data processing with a particular focus on two primary areas: (i) theoretical foundations for understanding the costs of distribution; and (ii) processing large-scale graph data.The first part of this thesis presents a theoretical framework for the MapReduce system, to analyze the cost of distribution for different problems domains, and for evaluating the ``goodness'' of different algorithms. We identify a fundamental tradeoff between the parallelism and communication costs of algorithms. We first study the setting when computations are constrained to a single round of MapReduce. In this setting,we capture the cost of distributing a problem by deriving a lower-bound curve on the communication cost of any algorithm that solves the problem for different parallelism levels. We derive lower-bound curves for several problems, and prove that existing or new one-round algorithms solving these problems are optimal, i.e., incur the minimum possible communication cost for different parallelism levels. We then show that by allowing multiple rounds of MapReduce computations, we can solve problems more efficiently than any possible one-round algorithm.The second part of this thesis addresses challenges in systems for processing large-scale graph data, with the goal of making graph computation more efficient and easier to program and debug. We focus on systems that are modeled after Google's Pregel framework for large-scale distributed graph processing. We begin by describing an open-source version of Pregel we developed, called GPS (for Graph Processing System). We then describe new static and dynamic schemes for partitioning graphs across machines, and we present experimental results on the performance effects of different partitioning schemes. Next, we describe a set of algorithmic optimizations that address commonly-appearing inefficiencies in algorithms programmed on Pregel-like systems. Because it can be very difficult to debug programs in Pregel-like systems, we developed a new replay-style debugger called Graft. In addition, we defined and implemented a set of high-level parallelizable graph primitives, called HelP (for High-level Primitives), as an alternative to programming graph algorithms using the low-level vertex-centric functions of existing systems. HelP primitives capture several commonly appearing operations in large-scale graph computations. We motivate and describe Graft and HelP using real-world applications and algorithms." @default.
- W817491301 created "2016-06-24" @default.
- W817491301 creator A5023876520 @default.
- W817491301 date "2015-07-01" @default.
- W817491301 modified "2023-09-26" @default.
- W817491301 title "Massive-scale Processing of Record-oriented and Graph Data" @default.
- W817491301 hasPublicationYear "2015" @default.
- W817491301 type Work @default.
- W817491301 sameAs 817491301 @default.
- W817491301 citedByCount "0" @default.
- W817491301 crossrefType "dissertation" @default.
- W817491301 hasAuthorship W817491301A5023876520 @default.
- W817491301 hasConcept C11413529 @default.
- W817491301 hasConcept C120314980 @default.
- W817491301 hasConcept C120665830 @default.
- W817491301 hasConcept C121332964 @default.
- W817491301 hasConcept C132525143 @default.
- W817491301 hasConcept C168065819 @default.
- W817491301 hasConcept C173608175 @default.
- W817491301 hasConcept C192209626 @default.
- W817491301 hasConcept C199360897 @default.
- W817491301 hasConcept C2778755073 @default.
- W817491301 hasConcept C2781172179 @default.
- W817491301 hasConcept C41008148 @default.
- W817491301 hasConcept C45374587 @default.
- W817491301 hasConcept C61483411 @default.
- W817491301 hasConcept C62520636 @default.
- W817491301 hasConcept C80444323 @default.
- W817491301 hasConceptScore W817491301C11413529 @default.
- W817491301 hasConceptScore W817491301C120314980 @default.
- W817491301 hasConceptScore W817491301C120665830 @default.
- W817491301 hasConceptScore W817491301C121332964 @default.
- W817491301 hasConceptScore W817491301C132525143 @default.
- W817491301 hasConceptScore W817491301C168065819 @default.
- W817491301 hasConceptScore W817491301C173608175 @default.
- W817491301 hasConceptScore W817491301C192209626 @default.
- W817491301 hasConceptScore W817491301C199360897 @default.
- W817491301 hasConceptScore W817491301C2778755073 @default.
- W817491301 hasConceptScore W817491301C2781172179 @default.
- W817491301 hasConceptScore W817491301C41008148 @default.
- W817491301 hasConceptScore W817491301C45374587 @default.
- W817491301 hasConceptScore W817491301C61483411 @default.
- W817491301 hasConceptScore W817491301C62520636 @default.
- W817491301 hasConceptScore W817491301C80444323 @default.
- W817491301 hasLocation W8174913011 @default.
- W817491301 hasOpenAccess W817491301 @default.
- W817491301 hasPrimaryLocation W8174913011 @default.
- W817491301 hasRelatedWork W107790154 @default.
- W817491301 hasRelatedWork W1507312539 @default.
- W817491301 hasRelatedWork W1827980970 @default.
- W817491301 hasRelatedWork W1973262089 @default.
- W817491301 hasRelatedWork W1989788024 @default.
- W817491301 hasRelatedWork W2139244173 @default.
- W817491301 hasRelatedWork W2272021717 @default.
- W817491301 hasRelatedWork W2415655736 @default.
- W817491301 hasRelatedWork W2472676307 @default.
- W817491301 hasRelatedWork W2594063632 @default.
- W817491301 hasRelatedWork W2606181798 @default.
- W817491301 hasRelatedWork W2802440256 @default.
- W817491301 hasRelatedWork W2946309116 @default.
- W817491301 hasRelatedWork W2949644438 @default.
- W817491301 hasRelatedWork W2972139238 @default.
- W817491301 hasRelatedWork W3008151513 @default.
- W817491301 hasRelatedWork W3097576103 @default.
- W817491301 hasRelatedWork W3149606206 @default.
- W817491301 hasRelatedWork W3211118578 @default.
- W817491301 hasRelatedWork W2962958005 @default.
- W817491301 isParatext "false" @default.
- W817491301 isRetracted "false" @default.
- W817491301 magId "817491301" @default.
- W817491301 workType "dissertation" @default.