Matches in SemOpenAlex for { <https://semopenalex.org/work/W2051289496> ?p ?o ?g. }
- W2051289496 endingPage "343" @default.
- W2051289496 startingPage "329" @default.
- W2051289496 abstract "The present article is concerned with the problem of automatic database population via information extraction (IE) from web pages obtained from heterogeneous sources, such as those retrieved by a domain crawler. Specifically, we address the task of filling single multi-field templates from individual documents, a common scenario that involves free-format documents with the same communicative goal such as job adverts, CVs, or meeting/seminar announcements. We discuss challenges that arise in this scenario and propose solutions to them at different levels of the processing of web page content. Our main focus is on the issue of information extraction, which we address with a two-step machine learning approach that first aims to determine segments of a page that are likely to contain relevant facts and then delimits specific natural language expressions with which to fill template fields. We also present a range of techniques for the enrichment of web pages with semantic annotations, such as recognition of named entities, domain terminology and coreference resolution, and examine their effect on the information extraction method. We evaluate the developed IE system on the task of automatically populating a database with information on language resources available on the web." @default.
- W2051289496 created "2016-06-24" @default.
- W2051289496 creator A5007426493 @default.
- W2051289496 creator A5061996885 @default.
- W2051289496 date "2007-05-02" @default.
- W2051289496 modified "2023-09-23" @default.
- W2051289496 title "Discovery of Language Resources on the Web: Information Extraction from Heterogeneous Documents" @default.
- W2051289496 cites W1553019137 @default.
- W2051289496 cites W1934019294 @default.
- W2051289496 cites W1975175514 @default.
- W2051289496 cites W1981082061 @default.
- W2051289496 cites W1986398135 @default.
- W2051289496 cites W1994485670 @default.
- W2051289496 cites W2012179495 @default.
- W2051289496 cites W2026080185 @default.
- W2051289496 cites W2044070623 @default.
- W2051289496 cites W2059933135 @default.
- W2051289496 cites W2093559286 @default.
- W2051289496 cites W2118020653 @default.
- W2051289496 cites W2123504579 @default.
- W2051289496 cites W2143349571 @default.
- W2051289496 cites W2149430911 @default.
- W2051289496 cites W2151823253 @default.
- W2051289496 cites W2162340487 @default.
- W2051289496 cites W4508078 @default.
- W2051289496 cites W13171135 @default.
- W2051289496 doi "https://doi.org/10.1093/llc/fqm010" @default.
- W2051289496 hasPublicationYear "2007" @default.
- W2051289496 type Work @default.
- W2051289496 sameAs 2051289496 @default.
- W2051289496 citedByCount "0" @default.
- W2051289496 crossrefType "journal-article" @default.
- W2051289496 hasAuthorship W2051289496A5007426493 @default.
- W2051289496 hasAuthorship W2051289496A5061996885 @default.
- W2051289496 hasBestOaLocation W20512894962 @default.
- W2051289496 hasConcept C120665830 @default.
- W2051289496 hasConcept C121332964 @default.
- W2051289496 hasConcept C134306372 @default.
- W2051289496 hasConcept C136764020 @default.
- W2051289496 hasConcept C13743948 @default.
- W2051289496 hasConcept C138268822 @default.
- W2051289496 hasConcept C138885662 @default.
- W2051289496 hasConcept C154945302 @default.
- W2051289496 hasConcept C162324750 @default.
- W2051289496 hasConcept C187736073 @default.
- W2051289496 hasConcept C192209626 @default.
- W2051289496 hasConcept C195807954 @default.
- W2051289496 hasConcept C202444582 @default.
- W2051289496 hasConcept C204321447 @default.
- W2051289496 hasConcept C21959979 @default.
- W2051289496 hasConcept C23123220 @default.
- W2051289496 hasConcept C2779135771 @default.
- W2051289496 hasConcept C2780451532 @default.
- W2051289496 hasConcept C28076734 @default.
- W2051289496 hasConcept C33923547 @default.
- W2051289496 hasConcept C36503486 @default.
- W2051289496 hasConcept C41008148 @default.
- W2051289496 hasConcept C41895202 @default.
- W2051289496 hasConcept C547195049 @default.
- W2051289496 hasConcept C9652623 @default.
- W2051289496 hasConceptScore W2051289496C120665830 @default.
- W2051289496 hasConceptScore W2051289496C121332964 @default.
- W2051289496 hasConceptScore W2051289496C134306372 @default.
- W2051289496 hasConceptScore W2051289496C136764020 @default.
- W2051289496 hasConceptScore W2051289496C13743948 @default.
- W2051289496 hasConceptScore W2051289496C138268822 @default.
- W2051289496 hasConceptScore W2051289496C138885662 @default.
- W2051289496 hasConceptScore W2051289496C154945302 @default.
- W2051289496 hasConceptScore W2051289496C162324750 @default.
- W2051289496 hasConceptScore W2051289496C187736073 @default.
- W2051289496 hasConceptScore W2051289496C192209626 @default.
- W2051289496 hasConceptScore W2051289496C195807954 @default.
- W2051289496 hasConceptScore W2051289496C202444582 @default.
- W2051289496 hasConceptScore W2051289496C204321447 @default.
- W2051289496 hasConceptScore W2051289496C21959979 @default.
- W2051289496 hasConceptScore W2051289496C23123220 @default.
- W2051289496 hasConceptScore W2051289496C2779135771 @default.
- W2051289496 hasConceptScore W2051289496C2780451532 @default.
- W2051289496 hasConceptScore W2051289496C28076734 @default.
- W2051289496 hasConceptScore W2051289496C33923547 @default.
- W2051289496 hasConceptScore W2051289496C36503486 @default.
- W2051289496 hasConceptScore W2051289496C41008148 @default.
- W2051289496 hasConceptScore W2051289496C41895202 @default.
- W2051289496 hasConceptScore W2051289496C547195049 @default.
- W2051289496 hasConceptScore W2051289496C9652623 @default.
- W2051289496 hasIssue "3" @default.
- W2051289496 hasLocation W20512894961 @default.
- W2051289496 hasLocation W20512894962 @default.
- W2051289496 hasLocation W20512894963 @default.
- W2051289496 hasOpenAccess W2051289496 @default.
- W2051289496 hasPrimaryLocation W20512894961 @default.
- W2051289496 hasRelatedWork W104581431 @default.
- W2051289496 hasRelatedWork W1463197156 @default.
- W2051289496 hasRelatedWork W1561729373 @default.
- W2051289496 hasRelatedWork W1788528807 @default.
- W2051289496 hasRelatedWork W1975174578 @default.
- W2051289496 hasRelatedWork W2153799433 @default.
- W2051289496 hasRelatedWork W2368651715 @default.