Matches in SemOpenAlex for { <https://semopenalex.org/work/W1972495172> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W1972495172 abstract "It is well known today that pages on the Web contain a large number of content-rich relational tables. Such tables have been systematically extracted in a number of efforts to empower important applications such as table search and schema discovery. However, a significant fraction of relational tables are not embedded in the standard HTML table tags, and are thus difficult to extract. In particular, a large number of relational tables are known to be in a ``list'' form, which contains a list of clearly separated rows that are not separated into columns. In this work, we address the important problem of automatically extracting multi-column relational tables from such lists. Our key intuition lies in the simple observation that in correctly-extracted tables, values in the same column are coherent, both at a syntactic and at a semantic level. Using a background corpus of over 100 million tables crawled from the Web, we quantify semantic coherence based on a statistical measure of value co-occurrence in the same column from the corpus. We then model table extraction as a principled optimization problem -- we allocate tokens in each row sequentially to a fixed number of columns, such that the sum of coherence across all pairs of values in the same column is maximized. Borrowing ideas from $A^star$ search and metric distance, we develop an efficient 2-approximation algorithm. We conduct large-scale table extraction experiments using both real Web data and proprietary enterprise spreadsheet data. Our approach considerably outperforms the state-of-the-art approaches in terms of quality, achieving over 90% F-measure across many cases." @default.
- W1972495172 created "2016-06-24" @default.
- W1972495172 creator A5028305892 @default.
- W1972495172 creator A5034908019 @default.
- W1972495172 creator A5068741709 @default.
- W1972495172 creator A5080534589 @default.
- W1972495172 date "2015-05-27" @default.
- W1972495172 modified "2023-09-27" @default.
- W1972495172 title "TEGRA" @default.
- W1972495172 cites W1981031568 @default.
- W1972495172 cites W1993141752 @default.
- W1972495172 cites W1994698869 @default.
- W1972495172 cites W1996505782 @default.
- W1972495172 cites W2029873015 @default.
- W1972495172 cites W2050802973 @default.
- W1972495172 cites W2053663417 @default.
- W1972495172 cites W2093559286 @default.
- W1972495172 cites W2093752301 @default.
- W1972495172 cites W2094728533 @default.
- W1972495172 cites W2103931177 @default.
- W1972495172 cites W2104042955 @default.
- W1972495172 cites W2108223890 @default.
- W1972495172 cites W2124410446 @default.
- W1972495172 cites W2135767707 @default.
- W1972495172 cites W2140116426 @default.
- W1972495172 cites W2145007893 @default.
- W1972495172 cites W2146105230 @default.
- W1972495172 cites W236085609 @default.
- W1972495172 cites W4233527139 @default.
- W1972495172 cites W4255011371 @default.
- W1972495172 doi "https://doi.org/10.1145/2723372.2723725" @default.
- W1972495172 hasPublicationYear "2015" @default.
- W1972495172 type Work @default.
- W1972495172 sameAs 1972495172 @default.
- W1972495172 citedByCount "24" @default.
- W1972495172 countsByYear W19724951722016 @default.
- W1972495172 countsByYear W19724951722017 @default.
- W1972495172 countsByYear W19724951722018 @default.
- W1972495172 countsByYear W19724951722019 @default.
- W1972495172 countsByYear W19724951722020 @default.
- W1972495172 countsByYear W19724951722021 @default.
- W1972495172 crossrefType "proceedings-article" @default.
- W1972495172 hasAuthorship W1972495172A5028305892 @default.
- W1972495172 hasAuthorship W1972495172A5034908019 @default.
- W1972495172 hasAuthorship W1972495172A5068741709 @default.
- W1972495172 hasAuthorship W1972495172A5080534589 @default.
- W1972495172 hasConcept C104140500 @default.
- W1972495172 hasConcept C124101348 @default.
- W1972495172 hasConcept C126042441 @default.
- W1972495172 hasConcept C135598885 @default.
- W1972495172 hasConcept C23123220 @default.
- W1972495172 hasConcept C2780551164 @default.
- W1972495172 hasConcept C41008148 @default.
- W1972495172 hasConcept C45235069 @default.
- W1972495172 hasConcept C52146309 @default.
- W1972495172 hasConcept C5655090 @default.
- W1972495172 hasConcept C76155785 @default.
- W1972495172 hasConcept C77088390 @default.
- W1972495172 hasConceptScore W1972495172C104140500 @default.
- W1972495172 hasConceptScore W1972495172C124101348 @default.
- W1972495172 hasConceptScore W1972495172C126042441 @default.
- W1972495172 hasConceptScore W1972495172C135598885 @default.
- W1972495172 hasConceptScore W1972495172C23123220 @default.
- W1972495172 hasConceptScore W1972495172C2780551164 @default.
- W1972495172 hasConceptScore W1972495172C41008148 @default.
- W1972495172 hasConceptScore W1972495172C45235069 @default.
- W1972495172 hasConceptScore W1972495172C52146309 @default.
- W1972495172 hasConceptScore W1972495172C5655090 @default.
- W1972495172 hasConceptScore W1972495172C76155785 @default.
- W1972495172 hasConceptScore W1972495172C77088390 @default.
- W1972495172 hasLocation W19724951721 @default.
- W1972495172 hasOpenAccess W1972495172 @default.
- W1972495172 hasPrimaryLocation W19724951721 @default.
- W1972495172 hasRelatedWork W1759560309 @default.
- W1972495172 hasRelatedWork W1969621019 @default.
- W1972495172 hasRelatedWork W1976022204 @default.
- W1972495172 hasRelatedWork W2020022499 @default.
- W1972495172 hasRelatedWork W2022166150 @default.
- W1972495172 hasRelatedWork W2092364718 @default.
- W1972495172 hasRelatedWork W2106895292 @default.
- W1972495172 hasRelatedWork W2108223890 @default.
- W1972495172 hasRelatedWork W2135767707 @default.
- W1972495172 hasRelatedWork W2148317291 @default.
- W1972495172 hasRelatedWork W2260484439 @default.
- W1972495172 hasRelatedWork W2270384880 @default.
- W1972495172 hasRelatedWork W2574230393 @default.
- W1972495172 hasRelatedWork W2752618741 @default.
- W1972495172 hasRelatedWork W2788809149 @default.
- W1972495172 hasRelatedWork W2808636123 @default.
- W1972495172 hasRelatedWork W3086759766 @default.
- W1972495172 hasRelatedWork W3160390043 @default.
- W1972495172 hasRelatedWork W3193561560 @default.
- W1972495172 hasRelatedWork W2821006098 @default.
- W1972495172 isParatext "false" @default.
- W1972495172 isRetracted "false" @default.
- W1972495172 magId "1972495172" @default.
- W1972495172 workType "article" @default.