Matches in SemOpenAlex for { <https://semopenalex.org/work/W3162130132> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W3162130132 endingPage "68" @default.
- W3162130132 startingPage "49" @default.
- W3162130132 abstract "Extracting data from user-friendly HTML tables is difficult because of their different layouts, formats, and encoding problems. In this article, we present a new proposal that first applies several pre-processing heuristics to clean the tables, then performs functional analysis, and finally applies some post-processing heuristics to produce the output. Our most important contribution is regarding functional analysis, which we address by projecting the cells onto a high-dimensional feature space in which a standard clustering technique is used to make the meta-data cells apart from the data cells. We experimented with two large repositories of real-world HTML tables and our results confirm that our proposal can extract data from them with an F1 score of 89.50% in just 0.09 CPU seconds per table. We confronted our proposal with several competitors and the statistical analysis confirmed its superiority in terms of effectiveness, while it keeps very competitive in terms of efficiency." @default.
- W3162130132 created "2021-05-24" @default.
- W3162130132 creator A5007741547 @default.
- W3162130132 creator A5024884892 @default.
- W3162130132 creator A5034417458 @default.
- W3162130132 creator A5068184256 @default.
- W3162130132 date "2021-10-01" @default.
- W3162130132 modified "2023-10-12" @default.
- W3162130132 title "TOMATE: A heuristic-based approach to extract data from HTML tables" @default.
- W3162130132 cites W1967774582 @default.
- W3162130132 cites W1972495172 @default.
- W3162130132 cites W1982134088 @default.
- W3162130132 cites W1999361961 @default.
- W3162130132 cites W2004193186 @default.
- W3162130132 cites W2024091454 @default.
- W3162130132 cites W2035168844 @default.
- W3162130132 cites W2078206655 @default.
- W3162130132 cites W2134708958 @default.
- W3162130132 cites W2148317291 @default.
- W3162130132 cites W2185942778 @default.
- W3162130132 cites W2471014220 @default.
- W3162130132 cites W2477934135 @default.
- W3162130132 cites W2505815569 @default.
- W3162130132 cites W2604225376 @default.
- W3162130132 cites W3008881932 @default.
- W3162130132 cites W4240555197 @default.
- W3162130132 doi "https://doi.org/10.1016/j.ins.2021.04.087" @default.
- W3162130132 hasPublicationYear "2021" @default.
- W3162130132 type Work @default.
- W3162130132 sameAs 3162130132 @default.
- W3162130132 citedByCount "7" @default.
- W3162130132 countsByYear W31621301322021 @default.
- W3162130132 countsByYear W31621301322022 @default.
- W3162130132 crossrefType "journal-article" @default.
- W3162130132 hasAuthorship W3162130132A5007741547 @default.
- W3162130132 hasAuthorship W3162130132A5024884892 @default.
- W3162130132 hasAuthorship W3162130132A5034417458 @default.
- W3162130132 hasAuthorship W3162130132A5068184256 @default.
- W3162130132 hasBestOaLocation W31621301322 @default.
- W3162130132 hasConcept C111919701 @default.
- W3162130132 hasConcept C124101348 @default.
- W3162130132 hasConcept C127576917 @default.
- W3162130132 hasConcept C127705205 @default.
- W3162130132 hasConcept C154945302 @default.
- W3162130132 hasConcept C162324750 @default.
- W3162130132 hasConcept C173801870 @default.
- W3162130132 hasConcept C187736073 @default.
- W3162130132 hasConcept C23123220 @default.
- W3162130132 hasConcept C2778572836 @default.
- W3162130132 hasConcept C41008148 @default.
- W3162130132 hasConcept C45235069 @default.
- W3162130132 hasConcept C73555534 @default.
- W3162130132 hasConceptScore W3162130132C111919701 @default.
- W3162130132 hasConceptScore W3162130132C124101348 @default.
- W3162130132 hasConceptScore W3162130132C127576917 @default.
- W3162130132 hasConceptScore W3162130132C127705205 @default.
- W3162130132 hasConceptScore W3162130132C154945302 @default.
- W3162130132 hasConceptScore W3162130132C162324750 @default.
- W3162130132 hasConceptScore W3162130132C173801870 @default.
- W3162130132 hasConceptScore W3162130132C187736073 @default.
- W3162130132 hasConceptScore W3162130132C23123220 @default.
- W3162130132 hasConceptScore W3162130132C2778572836 @default.
- W3162130132 hasConceptScore W3162130132C41008148 @default.
- W3162130132 hasConceptScore W3162130132C45235069 @default.
- W3162130132 hasConceptScore W3162130132C73555534 @default.
- W3162130132 hasFunder F4320326262 @default.
- W3162130132 hasLocation W31621301321 @default.
- W3162130132 hasLocation W31621301322 @default.
- W3162130132 hasOpenAccess W3162130132 @default.
- W3162130132 hasPrimaryLocation W31621301321 @default.
- W3162130132 hasRelatedWork W2007032764 @default.
- W3162130132 hasRelatedWork W2067280619 @default.
- W3162130132 hasRelatedWork W2115618655 @default.
- W3162130132 hasRelatedWork W2483226803 @default.
- W3162130132 hasRelatedWork W2513360157 @default.
- W3162130132 hasRelatedWork W3125143773 @default.
- W3162130132 hasRelatedWork W3143937874 @default.
- W3162130132 hasRelatedWork W3177062893 @default.
- W3162130132 hasRelatedWork W4312926500 @default.
- W3162130132 hasRelatedWork W803550684 @default.
- W3162130132 hasVolume "577" @default.
- W3162130132 isParatext "false" @default.
- W3162130132 isRetracted "false" @default.
- W3162130132 magId "3162130132" @default.
- W3162130132 workType "article" @default.