Matches in SemOpenAlex for { <https://semopenalex.org/work/W1482104932> ?p ?o ?g. }
- W1482104932 endingPage "415" @default.
- W1482104932 startingPage "410" @default.
- W1482104932 abstract "This paper studies the web wrapper generation for web pages of forum, blog and news web sites. While more and more web pages are dynamically generated using a common template populated with data from databases. This paper proposes a novel method that uses tree alignment and transfer learning method to generate the wrapper from this kind of web pages. We present a new tree alignment algorithm to find the best matching structure of the input web pages. A kind of linear regression method is employed to get the weight of different tag-matching. Based on the alignment, we merge the trees into one union tree whose nodes record the statistical information gotten from multiple web pages. We use a transfer learning method to find the most likely content block and use the alignment algorithm to detect the repeat patterns on the union tree. After that, we generate a wrapper to extract data from web pages. Experimental results show that the method can achieve high extraction accuracy and has steady performance." @default.
- W1482104932 created "2016-06-24" @default.
- W1482104932 creator A5016039620 @default.
- W1482104932 creator A5031068005 @default.
- W1482104932 creator A5050811132 @default.
- W1482104932 date "2010-06-23" @default.
- W1482104932 modified "2023-09-22" @default.
- W1482104932 title "Web wrapper generation using tree alignment and transfer learning" @default.
- W1482104932 cites W108340680 @default.
- W1482104932 cites W1988217119 @default.
- W1482104932 cites W1990117407 @default.
- W1482104932 cites W2005646337 @default.
- W1482104932 cites W2034797903 @default.
- W1482104932 cites W2037504378 @default.
- W1482104932 cites W2069388662 @default.
- W1482104932 cites W2083317499 @default.
- W1482104932 cites W2093559286 @default.
- W1482104932 cites W2096496923 @default.
- W1482104932 cites W2102694093 @default.
- W1482104932 cites W2104086170 @default.
- W1482104932 cites W2128341918 @default.
- W1482104932 cites W2129595335 @default.
- W1482104932 cites W2133669904 @default.
- W1482104932 cites W2134150392 @default.
- W1482104932 cites W2135479443 @default.
- W1482104932 cites W2143309843 @default.
- W1482104932 cites W2150721933 @default.
- W1482104932 cites W2152917747 @default.
- W1482104932 cites W2153072229 @default.
- W1482104932 cites W2160196229 @default.
- W1482104932 cites W2165666571 @default.
- W1482104932 cites W2167435152 @default.
- W1482104932 cites W2394167421 @default.
- W1482104932 hasPublicationYear "2010" @default.
- W1482104932 type Work @default.
- W1482104932 sameAs 1482104932 @default.
- W1482104932 citedByCount "0" @default.
- W1482104932 crossrefType "proceedings-article" @default.
- W1482104932 hasAuthorship W1482104932A5016039620 @default.
- W1482104932 hasAuthorship W1482104932A5031068005 @default.
- W1482104932 hasAuthorship W1482104932A5050811132 @default.
- W1482104932 hasConcept C105795698 @default.
- W1482104932 hasConcept C113174947 @default.
- W1482104932 hasConcept C118643609 @default.
- W1482104932 hasConcept C124101348 @default.
- W1482104932 hasConcept C130436687 @default.
- W1482104932 hasConcept C134306372 @default.
- W1482104932 hasConcept C136764020 @default.
- W1482104932 hasConcept C137922610 @default.
- W1482104932 hasConcept C162005631 @default.
- W1482104932 hasConcept C165064840 @default.
- W1482104932 hasConcept C197046077 @default.
- W1482104932 hasConcept C197129107 @default.
- W1482104932 hasConcept C21959979 @default.
- W1482104932 hasConcept C23123220 @default.
- W1482104932 hasConcept C33923547 @default.
- W1482104932 hasConcept C41008148 @default.
- W1482104932 hasConcept C77088390 @default.
- W1482104932 hasConceptScore W1482104932C105795698 @default.
- W1482104932 hasConceptScore W1482104932C113174947 @default.
- W1482104932 hasConceptScore W1482104932C118643609 @default.
- W1482104932 hasConceptScore W1482104932C124101348 @default.
- W1482104932 hasConceptScore W1482104932C130436687 @default.
- W1482104932 hasConceptScore W1482104932C134306372 @default.
- W1482104932 hasConceptScore W1482104932C136764020 @default.
- W1482104932 hasConceptScore W1482104932C137922610 @default.
- W1482104932 hasConceptScore W1482104932C162005631 @default.
- W1482104932 hasConceptScore W1482104932C165064840 @default.
- W1482104932 hasConceptScore W1482104932C197046077 @default.
- W1482104932 hasConceptScore W1482104932C197129107 @default.
- W1482104932 hasConceptScore W1482104932C21959979 @default.
- W1482104932 hasConceptScore W1482104932C23123220 @default.
- W1482104932 hasConceptScore W1482104932C33923547 @default.
- W1482104932 hasConceptScore W1482104932C41008148 @default.
- W1482104932 hasConceptScore W1482104932C77088390 @default.
- W1482104932 hasLocation W14821049321 @default.
- W1482104932 hasOpenAccess W1482104932 @default.
- W1482104932 hasPrimaryLocation W14821049321 @default.
- W1482104932 hasRelatedWork W111499458 @default.
- W1482104932 hasRelatedWork W1507164096 @default.
- W1482104932 hasRelatedWork W1509381869 @default.
- W1482104932 hasRelatedWork W1541154936 @default.
- W1482104932 hasRelatedWork W180238871 @default.
- W1482104932 hasRelatedWork W1971728686 @default.
- W1482104932 hasRelatedWork W1987947616 @default.
- W1482104932 hasRelatedWork W2048880500 @default.
- W1482104932 hasRelatedWork W2049365470 @default.
- W1482104932 hasRelatedWork W2093590786 @default.
- W1482104932 hasRelatedWork W2114321451 @default.
- W1482104932 hasRelatedWork W2140527871 @default.
- W1482104932 hasRelatedWork W2160819161 @default.
- W1482104932 hasRelatedWork W2169899598 @default.
- W1482104932 hasRelatedWork W2246771995 @default.
- W1482104932 hasRelatedWork W2393035056 @default.
- W1482104932 hasRelatedWork W2540109410 @default.
- W1482104932 hasRelatedWork W2751078258 @default.
- W1482104932 hasRelatedWork W2112800333 @default.
- W1482104932 hasRelatedWork W3182043260 @default.