Matches in SemOpenAlex for { <https://semopenalex.org/work/W130279117> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W130279117 endingPage "752" @default.
- W130279117 startingPage "750" @default.
- W130279117 abstract "The World Wide Web has become one of the most important connections to various sources of information. A large proportion of the data is embedded in HTML documents. This language serves the visual presentation of data in Internet browser, but does not provide semantic information for the data presented. This form of data presentation is, therefore, inappropriate for the demands of automated, computer assisted information management system. In particular, if data from different sources needs to be combined, it is necessary to develop special and often complex programs to automate the data extraction. Wrappers are specialized program routines to fulfil such tasks. They automatically extract data from Internet web sites and convert the information into a structured format. As the manual coding of wrappers is timeconsuming and error-prone process, different methods [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12] have been proposed to automate the wrapper generation process. As a rule, however, a specially developed wrapper is required for each individual data source, because of the different and unique structures of web sites. The WWW is also extremely dynamic and continually evolving, which results in frequent changes in the structures of web documents. Consequently, wrappers may stop working when the structures of the corresponding documents are changed no matter how they have been generated. It is often necessary to constantly update or even completely rewrite existing wrappers, in order to maintain the desired data extraction capabilities. The simplest way to maintain wrappers is to re-create wrappers using the new HTML documents. Obviously, this method is inefficient in that the maintenance depends mostly on the system developers. In this demo, we propose a novel schema-guided approach for wrapper maintenance, called SG-WRAM, which is based on our previous work, a schema-guided wrapper generation system (SG-WRAP[8,9]). SG-WRAP can generate a wrapper to extract data from an HTML document to produce an XML document conforming to the user-defined Schema. Although changes of HTML documents are extremely various, some features of desired information in previous document, e.g. syntactic features, data pattern, notation and underlying schemas are still preserved in the changed one. Syntactic features, data pattern and notation can be easily obtained from schemas, previous rules and extracting results. Therefore, it is feasible to recognize data items in the changed document using these features. Based on these observations, we fulfill the maintenance following four sequential steps. At First, syntactic features, data pattern and notation are obtained from the schema, previous rule and extracted results, then they are used to recognize the data items. After that, they are grouped according to the given schema. Each group is an instance of the given schema. At last, the representative instances are selected to re-induce the extraction rule. We name these four steps as features discovery, item recovery, block configuration and wrapper reparation respectively. Our schema guided method for wrapper maintenance has several unique features comparing to the related work. We make good use of schema, which is given by user during the process of wrapper generation, to assist the procedures of item recovery and block configuration; Our experience with real-life web documents shows that our method can deal with the changes from simple to complex including context shift, structural shift [12] and hybrid changes; In our system, we give different method for simple changes in which condition a part of the rule is disabled and the complex changes in which condition most of the rule is disabled. That makes the re-inducted rule more accurate and complete." @default.
- W130279117 created "2016-06-24" @default.
- W130279117 creator A5000446805 @default.
- W130279117 creator A5031593368 @default.
- W130279117 creator A5069427811 @default.
- W130279117 creator A5082515001 @default.
- W130279117 date "2003-01-01" @default.
- W130279117 modified "2023-09-24" @default.
- W130279117 title "SG-WRAM Schema Guided Wrapper Maintenance." @default.
- W130279117 cites W1498241032 @default.
- W130279117 cites W1553019137 @default.
- W130279117 cites W1602270052 @default.
- W130279117 cites W1921703248 @default.
- W130279117 cites W1927338256 @default.
- W130279117 cites W1995746869 @default.
- W130279117 cites W2026080185 @default.
- W130279117 cites W2097519514 @default.
- W130279117 cites W2136500370 @default.
- W130279117 cites W2147100344 @default.
- W130279117 cites W2148210463 @default.
- W130279117 cites W2154763111 @default.
- W130279117 cites W2156049581 @default.
- W130279117 cites W2912161846 @default.
- W130279117 hasPublicationYear "2003" @default.
- W130279117 type Work @default.
- W130279117 sameAs 130279117 @default.
- W130279117 citedByCount "0" @default.
- W130279117 crossrefType "proceedings-article" @default.
- W130279117 hasAuthorship W130279117A5000446805 @default.
- W130279117 hasAuthorship W130279117A5031593368 @default.
- W130279117 hasAuthorship W130279117A5069427811 @default.
- W130279117 hasAuthorship W130279117A5082515001 @default.
- W130279117 hasConcept C105795698 @default.
- W130279117 hasConcept C110875604 @default.
- W130279117 hasConcept C136764020 @default.
- W130279117 hasConcept C17744445 @default.
- W130279117 hasConcept C179518139 @default.
- W130279117 hasConcept C195807954 @default.
- W130279117 hasConcept C199360897 @default.
- W130279117 hasConcept C199539241 @default.
- W130279117 hasConcept C2129575 @default.
- W130279117 hasConcept C23123220 @default.
- W130279117 hasConcept C2777466982 @default.
- W130279117 hasConcept C2779473830 @default.
- W130279117 hasConcept C33923547 @default.
- W130279117 hasConcept C41008148 @default.
- W130279117 hasConcept C52146309 @default.
- W130279117 hasConcept C77088390 @default.
- W130279117 hasConcept C98045186 @default.
- W130279117 hasConceptScore W130279117C105795698 @default.
- W130279117 hasConceptScore W130279117C110875604 @default.
- W130279117 hasConceptScore W130279117C136764020 @default.
- W130279117 hasConceptScore W130279117C17744445 @default.
- W130279117 hasConceptScore W130279117C179518139 @default.
- W130279117 hasConceptScore W130279117C195807954 @default.
- W130279117 hasConceptScore W130279117C199360897 @default.
- W130279117 hasConceptScore W130279117C199539241 @default.
- W130279117 hasConceptScore W130279117C2129575 @default.
- W130279117 hasConceptScore W130279117C23123220 @default.
- W130279117 hasConceptScore W130279117C2777466982 @default.
- W130279117 hasConceptScore W130279117C2779473830 @default.
- W130279117 hasConceptScore W130279117C33923547 @default.
- W130279117 hasConceptScore W130279117C41008148 @default.
- W130279117 hasConceptScore W130279117C52146309 @default.
- W130279117 hasConceptScore W130279117C77088390 @default.
- W130279117 hasConceptScore W130279117C98045186 @default.
- W130279117 hasLocation W1302791171 @default.
- W130279117 hasOpenAccess W130279117 @default.
- W130279117 hasPrimaryLocation W1302791171 @default.
- W130279117 hasRelatedWork W106894449 @default.
- W130279117 hasRelatedWork W133949188 @default.
- W130279117 hasRelatedWork W1497197250 @default.
- W130279117 hasRelatedWork W1533589521 @default.
- W130279117 hasRelatedWork W1577901676 @default.
- W130279117 hasRelatedWork W165101881 @default.
- W130279117 hasRelatedWork W175503023 @default.
- W130279117 hasRelatedWork W1854286394 @default.
- W130279117 hasRelatedWork W1997281874 @default.
- W130279117 hasRelatedWork W2023883421 @default.
- W130279117 hasRelatedWork W2080390948 @default.
- W130279117 hasRelatedWork W2142507092 @default.
- W130279117 hasRelatedWork W2144632243 @default.
- W130279117 hasRelatedWork W2157752164 @default.
- W130279117 hasRelatedWork W2175483550 @default.
- W130279117 hasRelatedWork W2285276085 @default.
- W130279117 hasRelatedWork W2663509040 @default.
- W130279117 hasRelatedWork W2911768431 @default.
- W130279117 hasRelatedWork W2163784676 @default.
- W130279117 hasRelatedWork W944682745 @default.
- W130279117 isParatext "false" @default.
- W130279117 isRetracted "false" @default.
- W130279117 magId "130279117" @default.
- W130279117 workType "article" @default.