Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387634985> ?p ?o ?g. }
Showing items 1 to 53 of
53
with 100 items per page.
- W4387634985 abstract "Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset. There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics. Optimizing for data diversity leads to a coreset that is biased towards easier samples, whereas, selection by difficulty ranking omits easy samples that are necessary for the training of deep learning models. This demonstrates that data diversity and importance scores are two complementary factors that need to be jointly considered during coreset selection. We represent a dataset as an undirected graph and propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection. D2 Pruning updates the difficulty scores of each example by incorporating the difficulty of its neighboring examples in the dataset graph. Then, these updated difficulty scores direct a graph-based sampling method to select a coreset that encapsulates both diverse and difficult regions of the dataset space. We evaluate supervised and self-supervised versions of our method on various vision and language datasets. Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates. Additionally, we find that using D2 Pruning for filtering large multimodal datasets leads to increased diversity in the dataset and improved generalization of pretrained models." @default.
- W4387634985 created "2023-10-14" @default.
- W4387634985 creator A5001987532 @default.
- W4387634985 creator A5048025332 @default.
- W4387634985 creator A5078881867 @default.
- W4387634985 date "2023-10-11" @default.
- W4387634985 modified "2023-10-15" @default.
- W4387634985 title "D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning" @default.
- W4387634985 doi "https://doi.org/10.48550/arxiv.2310.07931" @default.
- W4387634985 hasPublicationYear "2023" @default.
- W4387634985 type Work @default.
- W4387634985 citedByCount "0" @default.
- W4387634985 crossrefType "posted-content" @default.
- W4387634985 hasAuthorship W4387634985A5001987532 @default.
- W4387634985 hasAuthorship W4387634985A5048025332 @default.
- W4387634985 hasAuthorship W4387634985A5078881867 @default.
- W4387634985 hasBestOaLocation W43876349851 @default.
- W4387634985 hasConcept C108010975 @default.
- W4387634985 hasConcept C119857082 @default.
- W4387634985 hasConcept C132525143 @default.
- W4387634985 hasConcept C154945302 @default.
- W4387634985 hasConcept C189430467 @default.
- W4387634985 hasConcept C41008148 @default.
- W4387634985 hasConcept C6557445 @default.
- W4387634985 hasConcept C80444323 @default.
- W4387634985 hasConcept C81917197 @default.
- W4387634985 hasConcept C86803240 @default.
- W4387634985 hasConceptScore W4387634985C108010975 @default.
- W4387634985 hasConceptScore W4387634985C119857082 @default.
- W4387634985 hasConceptScore W4387634985C132525143 @default.
- W4387634985 hasConceptScore W4387634985C154945302 @default.
- W4387634985 hasConceptScore W4387634985C189430467 @default.
- W4387634985 hasConceptScore W4387634985C41008148 @default.
- W4387634985 hasConceptScore W4387634985C6557445 @default.
- W4387634985 hasConceptScore W4387634985C80444323 @default.
- W4387634985 hasConceptScore W4387634985C81917197 @default.
- W4387634985 hasConceptScore W4387634985C86803240 @default.
- W4387634985 hasLocation W43876349851 @default.
- W4387634985 hasOpenAccess W4387634985 @default.
- W4387634985 hasPrimaryLocation W43876349851 @default.
- W4387634985 hasRelatedWork W1494981348 @default.
- W4387634985 hasRelatedWork W2040498587 @default.
- W4387634985 hasRelatedWork W2145253956 @default.
- W4387634985 hasRelatedWork W2355171581 @default.
- W4387634985 hasRelatedWork W2373300491 @default.
- W4387634985 hasRelatedWork W2378744544 @default.
- W4387634985 hasRelatedWork W2379704676 @default.
- W4387634985 hasRelatedWork W2384505857 @default.
- W4387634985 hasRelatedWork W4206442282 @default.
- W4387634985 hasRelatedWork W2594301978 @default.
- W4387634985 isParatext "false" @default.
- W4387634985 isRetracted "false" @default.
- W4387634985 workType "article" @default.