Matches in SemOpenAlex for { <https://semopenalex.org/work/W2097979891> ?p ?o ?g. }
Showing items 1 to 92 of
92
with 100 items per page.
- W2097979891 abstract "Search engines focusing on particular media types face difficulties in discovering suitable URIs on the Web. Since the engines are only interested in a small fraction of the Web, a crawler should use heuristics to concentrate on that fraction. To devise such a heuristic, we postulate four hypotheses based on RFCs and W3C recommendations to find cues for certain content types. Tests on a corpus of 22m files (793GB content size) containing 630m URIs show that for the content types text, image, and application, the recommendations are mostly being followed, while results for audio and video are much less consistent. Our findings and recommendations can be implemented as heuristics for efficient discovery of structured content on the Web on top of existing crawlers." @default.
- W2097979891 created "2016-06-24" @default.
- W2097979891 creator A5033280983 @default.
- W2097979891 creator A5048726103 @default.
- W2097979891 creator A5070504151 @default.
- W2097979891 creator A5071104283 @default.
- W2097979891 date "2008-07-01" @default.
- W2097979891 modified "2023-09-30" @default.
- W2097979891 title "Four Heuristics to Guide Structured Content Crawling" @default.
- W2097979891 cites W1489992655 @default.
- W2097979891 cites W1503924817 @default.
- W2097979891 cites W1506845741 @default.
- W2097979891 cites W1545469724 @default.
- W2097979891 cites W1559492731 @default.
- W2097979891 cites W1613836731 @default.
- W2097979891 cites W2019194162 @default.
- W2097979891 cites W2080676333 @default.
- W2097979891 cites W2107252390 @default.
- W2097979891 cites W2113313827 @default.
- W2097979891 cites W2124673015 @default.
- W2097979891 cites W2132573925 @default.
- W2097979891 cites W2140279085 @default.
- W2097979891 doi "https://doi.org/10.1109/icwe.2008.42" @default.
- W2097979891 hasPublicationYear "2008" @default.
- W2097979891 type Work @default.
- W2097979891 sameAs 2097979891 @default.
- W2097979891 citedByCount "9" @default.
- W2097979891 countsByYear W20979798912012 @default.
- W2097979891 countsByYear W20979798912014 @default.
- W2097979891 countsByYear W20979798912015 @default.
- W2097979891 countsByYear W20979798912019 @default.
- W2097979891 crossrefType "proceedings-article" @default.
- W2097979891 hasAuthorship W2097979891A5033280983 @default.
- W2097979891 hasAuthorship W2097979891A5048726103 @default.
- W2097979891 hasAuthorship W2097979891A5070504151 @default.
- W2097979891 hasAuthorship W2097979891A5071104283 @default.
- W2097979891 hasBestOaLocation W20979798912 @default.
- W2097979891 hasConcept C100368936 @default.
- W2097979891 hasConcept C105702510 @default.
- W2097979891 hasConcept C111919701 @default.
- W2097979891 hasConcept C127705205 @default.
- W2097979891 hasConcept C134306372 @default.
- W2097979891 hasConcept C136764020 @default.
- W2097979891 hasConcept C13743948 @default.
- W2097979891 hasConcept C149629883 @default.
- W2097979891 hasConcept C154945302 @default.
- W2097979891 hasConcept C173801870 @default.
- W2097979891 hasConcept C178790620 @default.
- W2097979891 hasConcept C185592680 @default.
- W2097979891 hasConcept C21959979 @default.
- W2097979891 hasConcept C23123220 @default.
- W2097979891 hasConcept C2778152352 @default.
- W2097979891 hasConcept C33923547 @default.
- W2097979891 hasConcept C41008148 @default.
- W2097979891 hasConcept C71924100 @default.
- W2097979891 hasConceptScore W2097979891C100368936 @default.
- W2097979891 hasConceptScore W2097979891C105702510 @default.
- W2097979891 hasConceptScore W2097979891C111919701 @default.
- W2097979891 hasConceptScore W2097979891C127705205 @default.
- W2097979891 hasConceptScore W2097979891C134306372 @default.
- W2097979891 hasConceptScore W2097979891C136764020 @default.
- W2097979891 hasConceptScore W2097979891C13743948 @default.
- W2097979891 hasConceptScore W2097979891C149629883 @default.
- W2097979891 hasConceptScore W2097979891C154945302 @default.
- W2097979891 hasConceptScore W2097979891C173801870 @default.
- W2097979891 hasConceptScore W2097979891C178790620 @default.
- W2097979891 hasConceptScore W2097979891C185592680 @default.
- W2097979891 hasConceptScore W2097979891C21959979 @default.
- W2097979891 hasConceptScore W2097979891C23123220 @default.
- W2097979891 hasConceptScore W2097979891C2778152352 @default.
- W2097979891 hasConceptScore W2097979891C33923547 @default.
- W2097979891 hasConceptScore W2097979891C41008148 @default.
- W2097979891 hasConceptScore W2097979891C71924100 @default.
- W2097979891 hasLocation W20979798911 @default.
- W2097979891 hasLocation W20979798912 @default.
- W2097979891 hasLocation W20979798913 @default.
- W2097979891 hasOpenAccess W2097979891 @default.
- W2097979891 hasPrimaryLocation W20979798911 @default.
- W2097979891 hasRelatedWork W1506122440 @default.
- W2097979891 hasRelatedWork W1673346501 @default.
- W2097979891 hasRelatedWork W2042201515 @default.
- W2097979891 hasRelatedWork W2051135816 @default.
- W2097979891 hasRelatedWork W2120136770 @default.
- W2097979891 hasRelatedWork W2161927007 @default.
- W2097979891 hasRelatedWork W2548298479 @default.
- W2097979891 hasRelatedWork W2783570127 @default.
- W2097979891 hasRelatedWork W3216588747 @default.
- W2097979891 hasRelatedWork W4300913644 @default.
- W2097979891 isParatext "false" @default.
- W2097979891 isRetracted "false" @default.
- W2097979891 magId "2097979891" @default.
- W2097979891 workType "article" @default.