Matches in SemOpenAlex for { <https://semopenalex.org/work/W2751078258> ?p ?o ?g. }
- W2751078258 endingPage "407" @default.
- W2751078258 startingPage "399" @default.
- W2751078258 abstract "In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text features such as textual delimiters, keywords, constants or text patterns, which we call handles, to construct patterns for the target data regions and data records. We offer a polynomial algorithm, in which these patterns are checked against the page elements in a mixed bottom-up and top-down traverse of the DOM-tree. The extracted data is directly mapped onto a hierarchical XML structure, which forms the output of the wrapper. The wrappers that are generated by this method are robust and independent of the HTML structure. Therefore, they can be adapted to similar websites to gather and integrate information." @default.
- W2751078258 created "2017-09-15" @default.
- W2751078258 creator A5006701961 @default.
- W2751078258 creator A5019465328 @default.
- W2751078258 creator A5060882243 @default.
- W2751078258 date "2018-07-01" @default.
- W2751078258 modified "2023-09-26" @default.
- W2751078258 title "Data Extraction using Content-Based Handles" @default.
- W2751078258 cites W2064466058 @default.
- W2751078258 cites W2069388662 @default.
- W2751078258 cites W2121137265 @default.
- W2751078258 cites W2128836931 @default.
- W2751078258 cites W2138480559 @default.
- W2751078258 cites W2143309843 @default.
- W2751078258 cites W2148317291 @default.
- W2751078258 cites W2154872327 @default.
- W2751078258 cites W2166407869 @default.
- W2751078258 cites W2171364811 @default.
- W2751078258 cites W6578894 @default.
- W2751078258 doi "https://doi.org/10.22044/jadm.2017.990" @default.
- W2751078258 hasPublicationYear "2018" @default.
- W2751078258 type Work @default.
- W2751078258 sameAs 2751078258 @default.
- W2751078258 citedByCount "3" @default.
- W2751078258 countsByYear W27510782582020 @default.
- W2751078258 countsByYear W27510782582021 @default.
- W2751078258 crossrefType "journal-article" @default.
- W2751078258 hasAuthorship W2751078258A5006701961 @default.
- W2751078258 hasAuthorship W2751078258A5019465328 @default.
- W2751078258 hasAuthorship W2751078258A5060882243 @default.
- W2751078258 hasConcept C113174947 @default.
- W2751078258 hasConcept C124101348 @default.
- W2751078258 hasConcept C13280743 @default.
- W2751078258 hasConcept C134306372 @default.
- W2751078258 hasConcept C136764020 @default.
- W2751078258 hasConcept C137922610 @default.
- W2751078258 hasConcept C176809094 @default.
- W2751078258 hasConcept C17744445 @default.
- W2751078258 hasConcept C195807954 @default.
- W2751078258 hasConcept C199360897 @default.
- W2751078258 hasConcept C199539241 @default.
- W2751078258 hasConcept C205649164 @default.
- W2751078258 hasConcept C21959979 @default.
- W2751078258 hasConcept C23123220 @default.
- W2751078258 hasConcept C2777466982 @default.
- W2751078258 hasConcept C2779473830 @default.
- W2751078258 hasConcept C2780801425 @default.
- W2751078258 hasConcept C33923547 @default.
- W2751078258 hasConcept C41008148 @default.
- W2751078258 hasConcept C8797682 @default.
- W2751078258 hasConceptScore W2751078258C113174947 @default.
- W2751078258 hasConceptScore W2751078258C124101348 @default.
- W2751078258 hasConceptScore W2751078258C13280743 @default.
- W2751078258 hasConceptScore W2751078258C134306372 @default.
- W2751078258 hasConceptScore W2751078258C136764020 @default.
- W2751078258 hasConceptScore W2751078258C137922610 @default.
- W2751078258 hasConceptScore W2751078258C176809094 @default.
- W2751078258 hasConceptScore W2751078258C17744445 @default.
- W2751078258 hasConceptScore W2751078258C195807954 @default.
- W2751078258 hasConceptScore W2751078258C199360897 @default.
- W2751078258 hasConceptScore W2751078258C199539241 @default.
- W2751078258 hasConceptScore W2751078258C205649164 @default.
- W2751078258 hasConceptScore W2751078258C21959979 @default.
- W2751078258 hasConceptScore W2751078258C23123220 @default.
- W2751078258 hasConceptScore W2751078258C2777466982 @default.
- W2751078258 hasConceptScore W2751078258C2779473830 @default.
- W2751078258 hasConceptScore W2751078258C2780801425 @default.
- W2751078258 hasConceptScore W2751078258C33923547 @default.
- W2751078258 hasConceptScore W2751078258C41008148 @default.
- W2751078258 hasConceptScore W2751078258C8797682 @default.
- W2751078258 hasIssue "2" @default.
- W2751078258 hasLocation W27510782581 @default.
- W2751078258 hasOpenAccess W2751078258 @default.
- W2751078258 hasPrimaryLocation W27510782581 @default.
- W2751078258 hasRelatedWork W1482104932 @default.
- W2751078258 hasRelatedWork W1489897710 @default.
- W2751078258 hasRelatedWork W1927392092 @default.
- W2751078258 hasRelatedWork W1968053850 @default.
- W2751078258 hasRelatedWork W1997836898 @default.
- W2751078258 hasRelatedWork W1998089058 @default.
- W2751078258 hasRelatedWork W2026345620 @default.
- W2751078258 hasRelatedWork W2031790754 @default.
- W2751078258 hasRelatedWork W2061189702 @default.
- W2751078258 hasRelatedWork W2069388662 @default.
- W2751078258 hasRelatedWork W2080375353 @default.
- W2751078258 hasRelatedWork W2094610447 @default.
- W2751078258 hasRelatedWork W2105602638 @default.
- W2751078258 hasRelatedWork W2163072729 @default.
- W2751078258 hasRelatedWork W2169262681 @default.
- W2751078258 hasRelatedWork W2187337573 @default.
- W2751078258 hasRelatedWork W2290163979 @default.
- W2751078258 hasRelatedWork W2311530945 @default.
- W2751078258 hasRelatedWork W2371193303 @default.
- W2751078258 hasRelatedWork W2393035056 @default.
- W2751078258 hasVolume "6" @default.
- W2751078258 isParatext "false" @default.
- W2751078258 isRetracted "false" @default.
- W2751078258 magId "2751078258" @default.