Matches in SemOpenAlex for { <https://semopenalex.org/work/W3166015079> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W3166015079 endingPage "147" @default.
- W3166015079 startingPage "132" @default.
- W3166015079 abstract "In the big data era, large amounts of data are under generation and accumulation in various industries. However, users usually feel hindered by the data quality issues when extracting values from the big data. Thus, data quality issues are gaining more and more attention from data quality management analysts. Cutting-edge solutions like data ETL, data cleaning, and data quality monitoring systems have many deficiencies in capability and efficiency, making it difficult to cope with complicated situations on big data. These problems inspire us to build SparkDQ, a generic distributed data quality management model and framework that provides a series of data quality detection and repair interfaces. Users can quickly build custom tasks of data quality computing for various needs by utilizing these interfaces. In addition, SparkDQ implements a set of algorithms that in a parallel manner with optimizations. These algorithms aim at various data quality goals. We also propose several system-level optimizations, including the job-level optimization with multi-task execution scheduling and the data-level optimization with data state caching. The experimental evaluation shows that the proposed distributed algorithms in SparkDQ run up to 12 times faster compared to the corresponding stand-alone serial and multi-thread algorithms. Compared with the cutting-edge distributed data quality solution Apache Griffin, SparkDQ has more features, and its execution time is only around half of Apache Griffin on average. SparkDQ achieves near-linear data and node scalability." @default.
- W3166015079 created "2021-06-22" @default.
- W3166015079 creator A5005377211 @default.
- W3166015079 creator A5007538828 @default.
- W3166015079 creator A5007818509 @default.
- W3166015079 creator A5034560014 @default.
- W3166015079 creator A5052175650 @default.
- W3166015079 creator A5062918181 @default.
- W3166015079 creator A5077437451 @default.
- W3166015079 date "2021-10-01" @default.
- W3166015079 modified "2023-10-16" @default.
- W3166015079 title "SparkDQ: Efficient generic big data quality management on distributed data-parallel computation" @default.
- W3166015079 cites W1995443851 @default.
- W3166015079 cites W2056748234 @default.
- W3166015079 cites W2063103859 @default.
- W3166015079 cites W2081186682 @default.
- W3166015079 cites W2083755493 @default.
- W3166015079 cites W2113607096 @default.
- W3166015079 cites W2287926972 @default.
- W3166015079 cites W2591700809 @default.
- W3166015079 cites W2963148337 @default.
- W3166015079 cites W3000214033 @default.
- W3166015079 doi "https://doi.org/10.1016/j.jpdc.2021.05.012" @default.
- W3166015079 hasPublicationYear "2021" @default.
- W3166015079 type Work @default.
- W3166015079 sameAs 3166015079 @default.
- W3166015079 citedByCount "3" @default.
- W3166015079 countsByYear W31660150792023 @default.
- W3166015079 crossrefType "journal-article" @default.
- W3166015079 hasAuthorship W3166015079A5005377211 @default.
- W3166015079 hasAuthorship W3166015079A5007538828 @default.
- W3166015079 hasAuthorship W3166015079A5007818509 @default.
- W3166015079 hasAuthorship W3166015079A5034560014 @default.
- W3166015079 hasAuthorship W3166015079A5052175650 @default.
- W3166015079 hasAuthorship W3166015079A5062918181 @default.
- W3166015079 hasAuthorship W3166015079A5077437451 @default.
- W3166015079 hasConcept C120314980 @default.
- W3166015079 hasConcept C124101348 @default.
- W3166015079 hasConcept C162324750 @default.
- W3166015079 hasConcept C1668388 @default.
- W3166015079 hasConcept C173608175 @default.
- W3166015079 hasConcept C176217482 @default.
- W3166015079 hasConcept C206729178 @default.
- W3166015079 hasConcept C21547014 @default.
- W3166015079 hasConcept C24756922 @default.
- W3166015079 hasConcept C41008148 @default.
- W3166015079 hasConcept C48044578 @default.
- W3166015079 hasConcept C75684735 @default.
- W3166015079 hasConcept C77088390 @default.
- W3166015079 hasConceptScore W3166015079C120314980 @default.
- W3166015079 hasConceptScore W3166015079C124101348 @default.
- W3166015079 hasConceptScore W3166015079C162324750 @default.
- W3166015079 hasConceptScore W3166015079C1668388 @default.
- W3166015079 hasConceptScore W3166015079C173608175 @default.
- W3166015079 hasConceptScore W3166015079C176217482 @default.
- W3166015079 hasConceptScore W3166015079C206729178 @default.
- W3166015079 hasConceptScore W3166015079C21547014 @default.
- W3166015079 hasConceptScore W3166015079C24756922 @default.
- W3166015079 hasConceptScore W3166015079C41008148 @default.
- W3166015079 hasConceptScore W3166015079C48044578 @default.
- W3166015079 hasConceptScore W3166015079C75684735 @default.
- W3166015079 hasConceptScore W3166015079C77088390 @default.
- W3166015079 hasFunder F4320321001 @default.
- W3166015079 hasFunder F4320335777 @default.
- W3166015079 hasLocation W31660150791 @default.
- W3166015079 hasOpenAccess W3166015079 @default.
- W3166015079 hasPrimaryLocation W31660150791 @default.
- W3166015079 hasRelatedWork W1569389315 @default.
- W3166015079 hasRelatedWork W1882733036 @default.
- W3166015079 hasRelatedWork W1992741870 @default.
- W3166015079 hasRelatedWork W2159682405 @default.
- W3166015079 hasRelatedWork W2160425906 @default.
- W3166015079 hasRelatedWork W2364921833 @default.
- W3166015079 hasRelatedWork W2380023786 @default.
- W3166015079 hasRelatedWork W2385146268 @default.
- W3166015079 hasRelatedWork W2546696010 @default.
- W3166015079 hasRelatedWork W2503642292 @default.
- W3166015079 hasVolume "156" @default.
- W3166015079 isParatext "false" @default.
- W3166015079 isRetracted "false" @default.
- W3166015079 magId "3166015079" @default.
- W3166015079 workType "article" @default.