Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387337617> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W4387337617 abstract "Extracting data from Web sites is still a challenge since pages have a complex and changeable structure, and the reason is simple: Web pages are designed to be visually user-friendly to users and not for the task of extracting data. In addition, each of them has its own and varied structures based on the HTML DOM structure. Since Web page designers can have their own standards for designing the pages, web page structures are widely divergent. So, identifying and extracting information still represents a significant barrier. To overcome this challenge, we propose a new approach called EDREW, which uses the information from the HTML DOM structure and the information generated through the HTML elements to represent the context of the elements on the page without the need for rendering. We use the ELMo model to extract information and classify them as noise or useful content. The experiments were performed on the public dataset Structured Web Data Extraction (SWDE) and on a new dataset created for this work, based on the most current versions of the pages in the dataset SWDE. Using EDREW, it was possible to overcome the baselines using the original SWDE dataset and extract twice as much page content using a new version of SWDE built by us with updated pages." @default.
- W4387337617 created "2023-10-05" @default.
- W4387337617 creator A5018547981 @default.
- W4387337617 creator A5038869659 @default.
- W4387337617 date "2023-10-23" @default.
- W4387337617 modified "2023-10-09" @default.
- W4387337617 title "EDREW - Enhanced Data Representation for Extraction in Web" @default.
- W4387337617 cites W1534875419 @default.
- W4387337617 cites W1997197279 @default.
- W4387337617 cites W2010886843 @default.
- W4387337617 cites W2015551056 @default.
- W4387337617 cites W2133669904 @default.
- W4387337617 cites W2138556038 @default.
- W4387337617 cites W2139481238 @default.
- W4387337617 cites W2143309843 @default.
- W4387337617 cites W2160189941 @default.
- W4387337617 cites W2160196229 @default.
- W4387337617 cites W2161861392 @default.
- W4387337617 cites W2171364811 @default.
- W4387337617 cites W2605038967 @default.
- W4387337617 cites W2773356393 @default.
- W4387337617 cites W2778605798 @default.
- W4387337617 cites W2788315304 @default.
- W4387337617 cites W2883357888 @default.
- W4387337617 cites W2913268761 @default.
- W4387337617 cites W2962739339 @default.
- W4387337617 cites W3133728220 @default.
- W4387337617 cites W4213039371 @default.
- W4387337617 cites W43928412 @default.
- W4387337617 doi "https://doi.org/10.1145/3617023.3617055" @default.
- W4387337617 hasPublicationYear "2023" @default.
- W4387337617 type Work @default.
- W4387337617 citedByCount "0" @default.
- W4387337617 crossrefType "proceedings-article" @default.
- W4387337617 hasAuthorship W4387337617A5018547981 @default.
- W4387337617 hasAuthorship W4387337617A5038869659 @default.
- W4387337617 hasConcept C136764020 @default.
- W4387337617 hasConcept C137922610 @default.
- W4387337617 hasConcept C154945302 @default.
- W4387337617 hasConcept C173576120 @default.
- W4387337617 hasConcept C17744445 @default.
- W4387337617 hasConcept C195807954 @default.
- W4387337617 hasConcept C199539241 @default.
- W4387337617 hasConcept C205711294 @default.
- W4387337617 hasConcept C21959979 @default.
- W4387337617 hasConcept C23123220 @default.
- W4387337617 hasConcept C2777466982 @default.
- W4387337617 hasConcept C2779473830 @default.
- W4387337617 hasConcept C41008148 @default.
- W4387337617 hasConcept C61096286 @default.
- W4387337617 hasConceptScore W4387337617C136764020 @default.
- W4387337617 hasConceptScore W4387337617C137922610 @default.
- W4387337617 hasConceptScore W4387337617C154945302 @default.
- W4387337617 hasConceptScore W4387337617C173576120 @default.
- W4387337617 hasConceptScore W4387337617C17744445 @default.
- W4387337617 hasConceptScore W4387337617C195807954 @default.
- W4387337617 hasConceptScore W4387337617C199539241 @default.
- W4387337617 hasConceptScore W4387337617C205711294 @default.
- W4387337617 hasConceptScore W4387337617C21959979 @default.
- W4387337617 hasConceptScore W4387337617C23123220 @default.
- W4387337617 hasConceptScore W4387337617C2777466982 @default.
- W4387337617 hasConceptScore W4387337617C2779473830 @default.
- W4387337617 hasConceptScore W4387337617C41008148 @default.
- W4387337617 hasConceptScore W4387337617C61096286 @default.
- W4387337617 hasLocation W43873376171 @default.
- W4387337617 hasOpenAccess W4387337617 @default.
- W4387337617 hasPrimaryLocation W43873376171 @default.
- W4387337617 hasRelatedWork W1519586109 @default.
- W4387337617 hasRelatedWork W1541158057 @default.
- W4387337617 hasRelatedWork W1674176887 @default.
- W4387337617 hasRelatedWork W2031790754 @default.
- W4387337617 hasRelatedWork W2065605022 @default.
- W4387337617 hasRelatedWork W2268257560 @default.
- W4387337617 hasRelatedWork W2357123337 @default.
- W4387337617 hasRelatedWork W2393053995 @default.
- W4387337617 hasRelatedWork W2626548695 @default.
- W4387337617 hasRelatedWork W3183046488 @default.
- W4387337617 isParatext "false" @default.
- W4387337617 isRetracted "false" @default.
- W4387337617 workType "article" @default.