Matches in SemOpenAlex for { <https://semopenalex.org/work/W2896711116> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W2896711116 abstract "Carrying out business processes successfully is closely linked to the quality of the data inventory in an organization. Lacks in data quality lead to problems: Incorrect address data prevents (timely) shipments to customers. Erroneous orders lead to returns and thus to unnecessary effort. Wrong pricing forces companies to miss out on revenues or to impair customer satisfaction. If orders or customer records cannot be retrieved, complaint management takes longer. Due to erroneous inventories, too few or too much supplies might be reordered.A special problem with data quality and the reason for many of the issues mentioned above are duplicates in databases. Duplicates are different representations of same real-world objects in a dataset. However, these representations differ from each other and are for that reason hard to match by a computer. Moreover, the number of required comparisons to find those duplicates grows with the square of the dataset size. To cleanse the data, these duplicates must be detected and removed. Duplicate detection is a very laborious process. To achieve satisfactory results, appropriate software must be created and configured (similarity measures, partitioning keys, thresholds, etc.). Both requires much manual effort and experience.This thesis addresses automation of parameter selection for duplicate detection and presents several novel approaches that eliminate the need for human experience in parts of the duplicate detection process.A pre-processing step is introduced that analyzes the datasets in question and classifies their attributes semantically. Not only do these annotations help understanding the respective datasets, but they also facilitate subsequent steps, for example, by selecting appropriate similarity measures or normalizing the data upfront. This approach works without schema information.Following that, we show a partitioning technique that strongly reduces the number of pair comparisons for the duplicate detection process. The approach automatically finds particularly suitable partitioning keys that simultaneously allow for effective and efficient duplicate retrieval. By means of a user study, we demonstrate that this technique finds partitioning keys that outperform expert suggestions and additionally does not need manual configuration. Furthermore, this approach can be applied independently of the attribute types.To measure the success of a duplicate detection process and to execute the described partitioning approach, a gold standard is required that provides information about the actual duplicates in a training dataset. This thesis presents a technique that uses existing duplicate detection results and crowdsourcing to create a near gold standard that can be used for the purposes above. Another part of the thesis describes and evaluates strategies how to reduce these crowdsourcing costs and to achieve a consensus with less effort." @default.
- W2896711116 created "2018-10-26" @default.
- W2896711116 creator A5072688774 @default.
- W2896711116 date "2018-01-01" @default.
- W2896711116 modified "2023-09-24" @default.
- W2896711116 title "Self-adaptive data quality" @default.
- W2896711116 hasPublicationYear "2018" @default.
- W2896711116 type Work @default.
- W2896711116 sameAs 2896711116 @default.
- W2896711116 citedByCount "0" @default.
- W2896711116 crossrefType "journal-article" @default.
- W2896711116 hasAuthorship W2896711116A5072688774 @default.
- W2896711116 hasConcept C111472728 @default.
- W2896711116 hasConcept C111919701 @default.
- W2896711116 hasConcept C115901376 @default.
- W2896711116 hasConcept C121955636 @default.
- W2896711116 hasConcept C124101348 @default.
- W2896711116 hasConcept C127413603 @default.
- W2896711116 hasConcept C138885662 @default.
- W2896711116 hasConcept C144133560 @default.
- W2896711116 hasConcept C176217482 @default.
- W2896711116 hasConcept C195487862 @default.
- W2896711116 hasConcept C21547014 @default.
- W2896711116 hasConcept C23123220 @default.
- W2896711116 hasConcept C24756922 @default.
- W2896711116 hasConcept C2522767166 @default.
- W2896711116 hasConcept C2779530757 @default.
- W2896711116 hasConcept C41008148 @default.
- W2896711116 hasConcept C77088390 @default.
- W2896711116 hasConcept C78519656 @default.
- W2896711116 hasConcept C98045186 @default.
- W2896711116 hasConceptScore W2896711116C111472728 @default.
- W2896711116 hasConceptScore W2896711116C111919701 @default.
- W2896711116 hasConceptScore W2896711116C115901376 @default.
- W2896711116 hasConceptScore W2896711116C121955636 @default.
- W2896711116 hasConceptScore W2896711116C124101348 @default.
- W2896711116 hasConceptScore W2896711116C127413603 @default.
- W2896711116 hasConceptScore W2896711116C138885662 @default.
- W2896711116 hasConceptScore W2896711116C144133560 @default.
- W2896711116 hasConceptScore W2896711116C176217482 @default.
- W2896711116 hasConceptScore W2896711116C195487862 @default.
- W2896711116 hasConceptScore W2896711116C21547014 @default.
- W2896711116 hasConceptScore W2896711116C23123220 @default.
- W2896711116 hasConceptScore W2896711116C24756922 @default.
- W2896711116 hasConceptScore W2896711116C2522767166 @default.
- W2896711116 hasConceptScore W2896711116C2779530757 @default.
- W2896711116 hasConceptScore W2896711116C41008148 @default.
- W2896711116 hasConceptScore W2896711116C77088390 @default.
- W2896711116 hasConceptScore W2896711116C78519656 @default.
- W2896711116 hasConceptScore W2896711116C98045186 @default.
- W2896711116 hasLocation W28967111161 @default.
- W2896711116 hasOpenAccess W2896711116 @default.
- W2896711116 hasPrimaryLocation W28967111161 @default.
- W2896711116 hasRelatedWork W1235128746 @default.
- W2896711116 hasRelatedWork W2050867684 @default.
- W2896711116 hasRelatedWork W2067566391 @default.
- W2896711116 hasRelatedWork W2160997853 @default.
- W2896711116 hasRelatedWork W2182988128 @default.
- W2896711116 hasRelatedWork W2221286410 @default.
- W2896711116 hasRelatedWork W2257216439 @default.
- W2896711116 hasRelatedWork W2294284203 @default.
- W2896711116 hasRelatedWork W2750888279 @default.
- W2896711116 hasRelatedWork W2770576712 @default.
- W2896711116 hasRelatedWork W2798896881 @default.
- W2896711116 hasRelatedWork W2805143279 @default.
- W2896711116 hasRelatedWork W2896587760 @default.
- W2896711116 hasRelatedWork W2970491965 @default.
- W2896711116 hasRelatedWork W3004596355 @default.
- W2896711116 hasRelatedWork W3110475619 @default.
- W2896711116 hasRelatedWork W3210246636 @default.
- W2896711116 hasRelatedWork W54395699 @default.
- W2896711116 hasRelatedWork W54840664 @default.
- W2896711116 hasRelatedWork W2182469692 @default.
- W2896711116 isParatext "false" @default.
- W2896711116 isRetracted "false" @default.
- W2896711116 magId "2896711116" @default.
- W2896711116 workType "article" @default.