Matches in SemOpenAlex for { <https://semopenalex.org/work/W2897914812> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W2897914812 abstract "Data de-duplication is the task of detecting multiple records that correspond to the same real-world entity in a database. In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters. We introduce a framework which we call promise correlation clustering. Given a complete graph $G$ with the edges labelled $0$ and $1$, the goal is to find a clustering that minimizes the number of $0$ edges within a cluster plus the number of $1$ edges across different clusters (or correlation loss). The optimal clustering can also be viewed as a complete graph $G^*$ with edges corresponding to points in the same cluster being labelled $0$ and other edges being labelled $1$. Under the promise that the edge difference between $G$ and $G^*$ is small, we prove that finding the optimal clustering (or $G^*$) is still NP-Hard. [Ashtiani et. al, 2016] introduced the framework of semi-supervised clustering, where the learning algorithm has access to an oracle, which answers whether two points belong to the same or different clusters. We further prove that even with access to a same-cluster oracle, the promise version is NP-Hard as long as the number queries to the oracle is not too large ($o(n)$ where $n$ is the number of vertices). Given these negative results, we consider a restricted version of correlation clustering. As before, the goal is to find a clustering that minimizes the correlation loss. However, we restrict ourselves to a given class $mathcal F$ of clusterings. We offer a semi-supervised algorithmic approach to solve the restricted variant with success guarantees." @default.
- W2897914812 created "2018-10-26" @default.
- W2897914812 creator A5000141065 @default.
- W2897914812 creator A5010258967 @default.
- W2897914812 creator A5046673910 @default.
- W2897914812 date "2018-10-10" @default.
- W2897914812 modified "2023-09-27" @default.
- W2897914812 title "Semi-supervised clustering for de-duplication" @default.
- W2897914812 hasPublicationYear "2018" @default.
- W2897914812 type Work @default.
- W2897914812 sameAs 2897914812 @default.
- W2897914812 citedByCount "0" @default.
- W2897914812 crossrefType "posted-content" @default.
- W2897914812 hasAuthorship W2897914812A5000141065 @default.
- W2897914812 hasAuthorship W2897914812A5010258967 @default.
- W2897914812 hasAuthorship W2897914812A5046673910 @default.
- W2897914812 hasConcept C114614502 @default.
- W2897914812 hasConcept C115903868 @default.
- W2897914812 hasConcept C124101348 @default.
- W2897914812 hasConcept C132525143 @default.
- W2897914812 hasConcept C153180895 @default.
- W2897914812 hasConcept C154945302 @default.
- W2897914812 hasConcept C164866538 @default.
- W2897914812 hasConcept C199360897 @default.
- W2897914812 hasConcept C23822008 @default.
- W2897914812 hasConcept C33704608 @default.
- W2897914812 hasConcept C33923547 @default.
- W2897914812 hasConcept C41008148 @default.
- W2897914812 hasConcept C55166926 @default.
- W2897914812 hasConcept C73555534 @default.
- W2897914812 hasConcept C80444323 @default.
- W2897914812 hasConcept C94641424 @default.
- W2897914812 hasConceptScore W2897914812C114614502 @default.
- W2897914812 hasConceptScore W2897914812C115903868 @default.
- W2897914812 hasConceptScore W2897914812C124101348 @default.
- W2897914812 hasConceptScore W2897914812C132525143 @default.
- W2897914812 hasConceptScore W2897914812C153180895 @default.
- W2897914812 hasConceptScore W2897914812C154945302 @default.
- W2897914812 hasConceptScore W2897914812C164866538 @default.
- W2897914812 hasConceptScore W2897914812C199360897 @default.
- W2897914812 hasConceptScore W2897914812C23822008 @default.
- W2897914812 hasConceptScore W2897914812C33704608 @default.
- W2897914812 hasConceptScore W2897914812C33923547 @default.
- W2897914812 hasConceptScore W2897914812C41008148 @default.
- W2897914812 hasConceptScore W2897914812C55166926 @default.
- W2897914812 hasConceptScore W2897914812C73555534 @default.
- W2897914812 hasConceptScore W2897914812C80444323 @default.
- W2897914812 hasConceptScore W2897914812C94641424 @default.
- W2897914812 hasLocation W28979148121 @default.
- W2897914812 hasOpenAccess W2897914812 @default.
- W2897914812 hasPrimaryLocation W28979148121 @default.
- W2897914812 hasRelatedWork W1572400628 @default.
- W2897914812 hasRelatedWork W2100721890 @default.
- W2897914812 hasRelatedWork W2110578405 @default.
- W2897914812 hasRelatedWork W2395693708 @default.
- W2897914812 hasRelatedWork W2560576693 @default.
- W2897914812 hasRelatedWork W2724958150 @default.
- W2897914812 hasRelatedWork W2773259269 @default.
- W2897914812 hasRelatedWork W2900031086 @default.
- W2897914812 hasRelatedWork W2938185404 @default.
- W2897914812 hasRelatedWork W2944876086 @default.
- W2897914812 hasRelatedWork W2951687014 @default.
- W2897914812 hasRelatedWork W2952712772 @default.
- W2897914812 hasRelatedWork W2962845353 @default.
- W2897914812 hasRelatedWork W2963907726 @default.
- W2897914812 hasRelatedWork W2975614804 @default.
- W2897914812 hasRelatedWork W3033291176 @default.
- W2897914812 hasRelatedWork W3102473470 @default.
- W2897914812 hasRelatedWork W3106228791 @default.
- W2897914812 hasRelatedWork W3213393422 @default.
- W2897914812 hasRelatedWork W47316739 @default.
- W2897914812 isParatext "false" @default.
- W2897914812 isRetracted "false" @default.
- W2897914812 magId "2897914812" @default.
- W2897914812 workType "article" @default.