Matches in SemOpenAlex for { <https://semopenalex.org/work/W4310235752> ?p ?o ?g. }
Showing items 1 to 100 of
100
with 100 items per page.
- W4310235752 endingPage "12031" @default.
- W4310235752 startingPage "12031" @default.
- W4310235752 abstract "This paper discusses the tool for the main text and image extraction (extracting and parsing the important data) from a web document. This paper describes our proposed algorithm based on the Document Object Model (DOM) and natural language processing (NLP) techniques and other approaches for extracting information from web pages using various classification techniques such as support vector machine, decision tree techniques, naive Bayes, and K-nearest neighbor. The main aim of the developed algorithm was to identify and extract the main block of a web document that contains the text of the article and the relevant images. The algorithm on a sample of 45 web documents of different types was applied. In addition, the issue of web pages, from the structure of the document to the use of the Document Object Model (DOM) for their processing, was analyzed. The Document Object Model was used to load and navigation of the document. It also plays an important role in the correct identification of the main block of web documents. The paper also discusses the levels of natural language. These methods of automatic natural language processing help to identify the main block of the web document. In this way, the all-textual parts and images from the main content of the web document were extracted. The experimental results show that our method achieved a final classification accuracy of 88.18%." @default.
- W4310235752 created "2022-11-30" @default.
- W4310235752 creator A5007114721 @default.
- W4310235752 creator A5050296539 @default.
- W4310235752 creator A5061330286 @default.
- W4310235752 creator A5085268197 @default.
- W4310235752 creator A5091067383 @default.
- W4310235752 date "2022-11-24" @default.
- W4310235752 modified "2023-10-05" @default.
- W4310235752 title "Tool for Parsing Important Data from Web Pages" @default.
- W4310235752 cites W1978137103 @default.
- W4310235752 cites W1979017997 @default.
- W4310235752 cites W2016539216 @default.
- W4310235752 cites W2045986835 @default.
- W4310235752 cites W2069205071 @default.
- W4310235752 cites W2080983714 @default.
- W4310235752 cites W2091855574 @default.
- W4310235752 cites W2096482843 @default.
- W4310235752 cites W2118023920 @default.
- W4310235752 cites W2119966992 @default.
- W4310235752 cites W2148317291 @default.
- W4310235752 cites W2569822874 @default.
- W4310235752 cites W2612984812 @default.
- W4310235752 cites W2799081297 @default.
- W4310235752 cites W2804022826 @default.
- W4310235752 cites W2898763787 @default.
- W4310235752 cites W2907927219 @default.
- W4310235752 cites W3004334530 @default.
- W4310235752 cites W3014183340 @default.
- W4310235752 cites W3019027838 @default.
- W4310235752 cites W3049099845 @default.
- W4310235752 cites W3102350406 @default.
- W4310235752 cites W2030968134 @default.
- W4310235752 doi "https://doi.org/10.3390/app122312031" @default.
- W4310235752 hasPublicationYear "2022" @default.
- W4310235752 type Work @default.
- W4310235752 citedByCount "0" @default.
- W4310235752 crossrefType "journal-article" @default.
- W4310235752 hasAuthorship W4310235752A5007114721 @default.
- W4310235752 hasAuthorship W4310235752A5050296539 @default.
- W4310235752 hasAuthorship W4310235752A5061330286 @default.
- W4310235752 hasAuthorship W4310235752A5085268197 @default.
- W4310235752 hasAuthorship W4310235752A5091067383 @default.
- W4310235752 hasBestOaLocation W43102357521 @default.
- W4310235752 hasConcept C116834253 @default.
- W4310235752 hasConcept C12267149 @default.
- W4310235752 hasConcept C136764020 @default.
- W4310235752 hasConcept C137922610 @default.
- W4310235752 hasConcept C154945302 @default.
- W4310235752 hasConcept C186644900 @default.
- W4310235752 hasConcept C197046077 @default.
- W4310235752 hasConcept C204321447 @default.
- W4310235752 hasConcept C21959979 @default.
- W4310235752 hasConcept C23123220 @default.
- W4310235752 hasConcept C2524010 @default.
- W4310235752 hasConcept C2777210771 @default.
- W4310235752 hasConcept C2781238097 @default.
- W4310235752 hasConcept C33923547 @default.
- W4310235752 hasConcept C41008148 @default.
- W4310235752 hasConcept C52001869 @default.
- W4310235752 hasConcept C59822182 @default.
- W4310235752 hasConcept C86803240 @default.
- W4310235752 hasConceptScore W4310235752C116834253 @default.
- W4310235752 hasConceptScore W4310235752C12267149 @default.
- W4310235752 hasConceptScore W4310235752C136764020 @default.
- W4310235752 hasConceptScore W4310235752C137922610 @default.
- W4310235752 hasConceptScore W4310235752C154945302 @default.
- W4310235752 hasConceptScore W4310235752C186644900 @default.
- W4310235752 hasConceptScore W4310235752C197046077 @default.
- W4310235752 hasConceptScore W4310235752C204321447 @default.
- W4310235752 hasConceptScore W4310235752C21959979 @default.
- W4310235752 hasConceptScore W4310235752C23123220 @default.
- W4310235752 hasConceptScore W4310235752C2524010 @default.
- W4310235752 hasConceptScore W4310235752C2777210771 @default.
- W4310235752 hasConceptScore W4310235752C2781238097 @default.
- W4310235752 hasConceptScore W4310235752C33923547 @default.
- W4310235752 hasConceptScore W4310235752C41008148 @default.
- W4310235752 hasConceptScore W4310235752C52001869 @default.
- W4310235752 hasConceptScore W4310235752C59822182 @default.
- W4310235752 hasConceptScore W4310235752C86803240 @default.
- W4310235752 hasIssue "23" @default.
- W4310235752 hasLocation W43102357521 @default.
- W4310235752 hasOpenAccess W4310235752 @default.
- W4310235752 hasPrimaryLocation W43102357521 @default.
- W4310235752 hasRelatedWork W2088059031 @default.
- W4310235752 hasRelatedWork W2133814403 @default.
- W4310235752 hasRelatedWork W2167662847 @default.
- W4310235752 hasRelatedWork W2293457016 @default.
- W4310235752 hasRelatedWork W2369308426 @default.
- W4310235752 hasRelatedWork W2383869160 @default.
- W4310235752 hasRelatedWork W2502722637 @default.
- W4310235752 hasRelatedWork W2977842567 @default.
- W4310235752 hasRelatedWork W4378977073 @default.
- W4310235752 hasRelatedWork W1551406738 @default.
- W4310235752 hasVolume "12" @default.
- W4310235752 isParatext "false" @default.
- W4310235752 isRetracted "false" @default.
- W4310235752 workType "article" @default.