Matches in SemOpenAlex for { <https://semopenalex.org/work/W2754770672> ?p ?o ?g. }
- W2754770672 abstract "We introduce and study a new data sketch for processing massive datasets. It addresses two common problems: 1) computing a sum given arbitrary filter conditions and 2) identifying the frequent items or heavy hitters in a data set. For the former, the sketch provides unbiased estimates with state of the art accuracy. It handles the challenging scenario when the data is disaggregated so that computing the per unit metric of interest requires an expensive aggregation. For example, the metric of interest may be total clicks per user while the raw data is a click stream with multiple rows per user. Thus the sketch is suitable for use in a wide range of applications including computing historical click through rates for ad prediction, reporting user metrics from event streams, and measuring network traffic for IP flows. We prove and empirically show the sketch has good properties for both the disaggregated subset sum estimation and frequent item problems. On i.i.d. data, it not only picks out the frequent items but gives strongly consistent estimates for the proportion of each frequent item. The resulting sketch asymptotically draws a probability proportional to size sample that is optimal for estimating sums over the data. For non i.i.d. data, we show that it typically does much better than random sampling for the frequent item problem and never does worse. For subset sum estimation, we show that even for pathological sequences, the variance is close to that of an optimal sampling design. Empirically, despite the disadvantage of operating on disaggregated data, our method matches or bests priority sampling, a state of the art method for pre-aggregated data and performs orders of magnitude better on skewed data compared to uniform sampling. We propose extensions to the sketch that allow it to be used in combining multiple data sets, in distributed systems, and for time decayed aggregation." @default.
- W2754770672 created "2017-09-25" @default.
- W2754770672 creator A5090993570 @default.
- W2754770672 date "2017-09-12" @default.
- W2754770672 modified "2023-09-25" @default.
- W2754770672 title "Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation" @default.
- W2754770672 cites W1553409264 @default.
- W2754770672 cites W1675727887 @default.
- W2754770672 cites W1848389068 @default.
- W2754770672 cites W1937109390 @default.
- W2754770672 cites W1972076792 @default.
- W2754770672 cites W1979819093 @default.
- W2754770672 cites W1993482412 @default.
- W2754770672 cites W2006355640 @default.
- W2754770672 cites W2033612074 @default.
- W2754770672 cites W2036304306 @default.
- W2754770672 cites W2080234606 @default.
- W2754770672 cites W2080745194 @default.
- W2754770672 cites W2085845250 @default.
- W2754770672 cites W2088424151 @default.
- W2754770672 cites W2090883204 @default.
- W2754770672 cites W2092236286 @default.
- W2754770672 cites W2113139394 @default.
- W2754770672 cites W2113227443 @default.
- W2754770672 cites W2169927847 @default.
- W2754770672 cites W2255614306 @default.
- W2754770672 cites W2294895103 @default.
- W2754770672 cites W2963496590 @default.
- W2754770672 cites W59037818 @default.
- W2754770672 doi "https://doi.org/10.48550/arxiv.1709.04048" @default.
- W2754770672 hasPublicationYear "2017" @default.
- W2754770672 type Work @default.
- W2754770672 sameAs 2754770672 @default.
- W2754770672 citedByCount "0" @default.
- W2754770672 crossrefType "posted-content" @default.
- W2754770672 hasAuthorship W2754770672A5090993570 @default.
- W2754770672 hasBestOaLocation W27547706721 @default.
- W2754770672 hasConcept C106131492 @default.
- W2754770672 hasConcept C11413529 @default.
- W2754770672 hasConcept C121332964 @default.
- W2754770672 hasConcept C121955636 @default.
- W2754770672 hasConcept C124101348 @default.
- W2754770672 hasConcept C132964779 @default.
- W2754770672 hasConcept C135598885 @default.
- W2754770672 hasConcept C140779682 @default.
- W2754770672 hasConcept C144133560 @default.
- W2754770672 hasConcept C159985019 @default.
- W2754770672 hasConcept C162324750 @default.
- W2754770672 hasConcept C176217482 @default.
- W2754770672 hasConcept C177264268 @default.
- W2754770672 hasConcept C185592680 @default.
- W2754770672 hasConcept C192562407 @default.
- W2754770672 hasConcept C196083921 @default.
- W2754770672 hasConcept C198531522 @default.
- W2754770672 hasConcept C199360897 @default.
- W2754770672 hasConcept C204323151 @default.
- W2754770672 hasConcept C21547014 @default.
- W2754770672 hasConcept C2779231336 @default.
- W2754770672 hasConcept C2779662365 @default.
- W2754770672 hasConcept C31972630 @default.
- W2754770672 hasConcept C41008148 @default.
- W2754770672 hasConcept C43617362 @default.
- W2754770672 hasConcept C62520636 @default.
- W2754770672 hasConcept C77088390 @default.
- W2754770672 hasConceptScore W2754770672C106131492 @default.
- W2754770672 hasConceptScore W2754770672C11413529 @default.
- W2754770672 hasConceptScore W2754770672C121332964 @default.
- W2754770672 hasConceptScore W2754770672C121955636 @default.
- W2754770672 hasConceptScore W2754770672C124101348 @default.
- W2754770672 hasConceptScore W2754770672C132964779 @default.
- W2754770672 hasConceptScore W2754770672C135598885 @default.
- W2754770672 hasConceptScore W2754770672C140779682 @default.
- W2754770672 hasConceptScore W2754770672C144133560 @default.
- W2754770672 hasConceptScore W2754770672C159985019 @default.
- W2754770672 hasConceptScore W2754770672C162324750 @default.
- W2754770672 hasConceptScore W2754770672C176217482 @default.
- W2754770672 hasConceptScore W2754770672C177264268 @default.
- W2754770672 hasConceptScore W2754770672C185592680 @default.
- W2754770672 hasConceptScore W2754770672C192562407 @default.
- W2754770672 hasConceptScore W2754770672C196083921 @default.
- W2754770672 hasConceptScore W2754770672C198531522 @default.
- W2754770672 hasConceptScore W2754770672C199360897 @default.
- W2754770672 hasConceptScore W2754770672C204323151 @default.
- W2754770672 hasConceptScore W2754770672C21547014 @default.
- W2754770672 hasConceptScore W2754770672C2779231336 @default.
- W2754770672 hasConceptScore W2754770672C2779662365 @default.
- W2754770672 hasConceptScore W2754770672C31972630 @default.
- W2754770672 hasConceptScore W2754770672C41008148 @default.
- W2754770672 hasConceptScore W2754770672C43617362 @default.
- W2754770672 hasConceptScore W2754770672C62520636 @default.
- W2754770672 hasConceptScore W2754770672C77088390 @default.
- W2754770672 hasLocation W27547706721 @default.
- W2754770672 hasLocation W27547706722 @default.
- W2754770672 hasOpenAccess W2754770672 @default.
- W2754770672 hasPrimaryLocation W27547706721 @default.
- W2754770672 hasRelatedWork W2006716391 @default.
- W2754770672 hasRelatedWork W2026039762 @default.
- W2754770672 hasRelatedWork W2177890446 @default.
- W2754770672 hasRelatedWork W2251677563 @default.
- W2754770672 hasRelatedWork W2600707098 @default.