Matches in SemOpenAlex for { <https://semopenalex.org/work/W2421306863> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W2421306863 abstract "Wrapper induction is the problem of automatically inferring a query from annotated web pages of the same template. This query should not only select the annotated content accurately but also other content following the same template. Beyond accurately matching the template, we consider two additional requirements: (1) wrappers should be robust against a large class of changes to the web pages, and (2) the induction process should be noise resistant, i.e., tolerate slightly erroneous (e.g., machine generated) samples. Key to our approach is a query language that is powerful enough to permit accurate selection, but limited enough to force noisy samples to be generalized into wrappers that select the likely intended items. We introduce such a language as subset of XPATH and show that even for such a restricted language, inducing optimal queries according to a suitable scoring is infeasible. Nevertheless, our wrapper induction framework infers highly robust and noise resistant queries. We evaluate the queries on snapshots from web pages that change over time as provided by the Internet Archive, and show that the induced queries are as robust as the human-made queries. The queries often survive hundreds sometimes thousands of days, with many changes to the relative position of the selected nodes (including changes on template level). This is due to the few and discriminative anchor (intermediately selected) nodes of the generated queries. The queries are highly resistant against positive noise (up to 50%) and negative noise (up to 20%)." @default.
- W2421306863 created "2016-06-24" @default.
- W2421306863 creator A5010785818 @default.
- W2421306863 creator A5012953835 @default.
- W2421306863 creator A5024566364 @default.
- W2421306863 creator A5051365299 @default.
- W2421306863 date "2016-06-14" @default.
- W2421306863 modified "2023-09-27" @default.
- W2421306863 title "Robust and Noise Resistant Wrapper Induction" @default.
- W2421306863 cites W1531487010 @default.
- W2421306863 cites W1971415141 @default.
- W2421306863 cites W2002956097 @default.
- W2421306863 cites W2035302703 @default.
- W2421306863 cites W2037959103 @default.
- W2421306863 cites W2080132606 @default.
- W2421306863 cites W2084801987 @default.
- W2421306863 cites W2096765155 @default.
- W2421306863 cites W2097874932 @default.
- W2421306863 cites W2100965097 @default.
- W2421306863 cites W2111278149 @default.
- W2421306863 cites W2120101509 @default.
- W2421306863 cites W2120170839 @default.
- W2421306863 cites W2148317291 @default.
- W2421306863 cites W2160189941 @default.
- W2421306863 cites W2170477095 @default.
- W2421306863 cites W2296164608 @default.
- W2421306863 cites W279841946 @default.
- W2421306863 cites W3000265418 @default.
- W2421306863 doi "https://doi.org/10.1145/2882903.2915214" @default.
- W2421306863 hasPublicationYear "2016" @default.
- W2421306863 type Work @default.
- W2421306863 sameAs 2421306863 @default.
- W2421306863 citedByCount "10" @default.
- W2421306863 countsByYear W24213068632017 @default.
- W2421306863 countsByYear W24213068632018 @default.
- W2421306863 countsByYear W24213068632019 @default.
- W2421306863 countsByYear W24213068632021 @default.
- W2421306863 countsByYear W24213068632022 @default.
- W2421306863 crossrefType "proceedings-article" @default.
- W2421306863 hasAuthorship W2421306863A5010785818 @default.
- W2421306863 hasAuthorship W2421306863A5012953835 @default.
- W2421306863 hasAuthorship W2421306863A5024566364 @default.
- W2421306863 hasAuthorship W2421306863A5051365299 @default.
- W2421306863 hasBestOaLocation W24213068632 @default.
- W2421306863 hasConcept C110875604 @default.
- W2421306863 hasConcept C115961682 @default.
- W2421306863 hasConcept C124101348 @default.
- W2421306863 hasConcept C136764020 @default.
- W2421306863 hasConcept C154945302 @default.
- W2421306863 hasConcept C23123220 @default.
- W2421306863 hasConcept C26517878 @default.
- W2421306863 hasConcept C2777212361 @default.
- W2421306863 hasConcept C2780213375 @default.
- W2421306863 hasConcept C38652104 @default.
- W2421306863 hasConcept C41008148 @default.
- W2421306863 hasConcept C55348073 @default.
- W2421306863 hasConcept C81917197 @default.
- W2421306863 hasConcept C8797682 @default.
- W2421306863 hasConcept C97931131 @default.
- W2421306863 hasConcept C99498987 @default.
- W2421306863 hasConceptScore W2421306863C110875604 @default.
- W2421306863 hasConceptScore W2421306863C115961682 @default.
- W2421306863 hasConceptScore W2421306863C124101348 @default.
- W2421306863 hasConceptScore W2421306863C136764020 @default.
- W2421306863 hasConceptScore W2421306863C154945302 @default.
- W2421306863 hasConceptScore W2421306863C23123220 @default.
- W2421306863 hasConceptScore W2421306863C26517878 @default.
- W2421306863 hasConceptScore W2421306863C2777212361 @default.
- W2421306863 hasConceptScore W2421306863C2780213375 @default.
- W2421306863 hasConceptScore W2421306863C38652104 @default.
- W2421306863 hasConceptScore W2421306863C41008148 @default.
- W2421306863 hasConceptScore W2421306863C55348073 @default.
- W2421306863 hasConceptScore W2421306863C81917197 @default.
- W2421306863 hasConceptScore W2421306863C8797682 @default.
- W2421306863 hasConceptScore W2421306863C97931131 @default.
- W2421306863 hasConceptScore W2421306863C99498987 @default.
- W2421306863 hasFunder F4320334627 @default.
- W2421306863 hasFunder F4320334678 @default.
- W2421306863 hasLocation W24213068631 @default.
- W2421306863 hasLocation W24213068632 @default.
- W2421306863 hasLocation W24213068633 @default.
- W2421306863 hasOpenAccess W2421306863 @default.
- W2421306863 hasPrimaryLocation W24213068631 @default.
- W2421306863 hasRelatedWork W1519515941 @default.
- W2421306863 hasRelatedWork W1545208856 @default.
- W2421306863 hasRelatedWork W154770750 @default.
- W2421306863 hasRelatedWork W2088507760 @default.
- W2421306863 hasRelatedWork W2180299897 @default.
- W2421306863 hasRelatedWork W2331478986 @default.
- W2421306863 hasRelatedWork W2364562957 @default.
- W2421306863 hasRelatedWork W2367280756 @default.
- W2421306863 hasRelatedWork W2372230021 @default.
- W2421306863 hasRelatedWork W2372354777 @default.
- W2421306863 isParatext "false" @default.
- W2421306863 isRetracted "false" @default.
- W2421306863 magId "2421306863" @default.
- W2421306863 workType "article" @default.