Matches in SemOpenAlex for { <https://semopenalex.org/work/W4378506809> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4378506809 abstract "The rapid growth of web pages and the increasing complexity of their structure poses a challenge for web mining models. Web mining models are required to understand the semi-structured web pages, particularly when little is known about the subject or template of a new page. Current methods migrate language models to the web mining by embedding the XML source code into the transformer or encoding the rendered layout with graph neural networks. However, these approaches do not take into account the relationships between text nodes within and across pages. In this paper, we propose a new approach, ReXMiner, for zero-shot relation extraction in web mining. ReXMiner encodes the shortest relative paths in the Document Object Model (DOM) tree which is a more accurate and efficient signal for key-value pair extraction within a web page. It also incorporates the popularity of each text node by counting the occurrence of the same text node across different web pages. We use the contrastive learning to address the issue of sparsity in relation extraction. Extensive experiments on public benchmarks show that our method, ReXMiner, outperforms the state-of-the-art baselines in the task of zero-shot relation extraction in web mining." @default.
- W4378506809 created "2023-05-27" @default.
- W4378506809 creator A5039500313 @default.
- W4378506809 creator A5041840821 @default.
- W4378506809 date "2023-05-23" @default.
- W4378506809 modified "2023-09-30" @default.
- W4378506809 title "Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path" @default.
- W4378506809 doi "https://doi.org/10.48550/arxiv.2305.13805" @default.
- W4378506809 hasPublicationYear "2023" @default.
- W4378506809 type Work @default.
- W4378506809 citedByCount "0" @default.
- W4378506809 crossrefType "posted-content" @default.
- W4378506809 hasAuthorship W4378506809A5039500313 @default.
- W4378506809 hasAuthorship W4378506809A5041840821 @default.
- W4378506809 hasBestOaLocation W43785068091 @default.
- W4378506809 hasConcept C124101348 @default.
- W4378506809 hasConcept C130436687 @default.
- W4378506809 hasConcept C136764020 @default.
- W4378506809 hasConcept C137922610 @default.
- W4378506809 hasConcept C153604712 @default.
- W4378506809 hasConcept C154945302 @default.
- W4378506809 hasConcept C195807954 @default.
- W4378506809 hasConcept C197046077 @default.
- W4378506809 hasConcept C21959979 @default.
- W4378506809 hasConcept C23123220 @default.
- W4378506809 hasConcept C41008148 @default.
- W4378506809 hasConcept C8797682 @default.
- W4378506809 hasConceptScore W4378506809C124101348 @default.
- W4378506809 hasConceptScore W4378506809C130436687 @default.
- W4378506809 hasConceptScore W4378506809C136764020 @default.
- W4378506809 hasConceptScore W4378506809C137922610 @default.
- W4378506809 hasConceptScore W4378506809C153604712 @default.
- W4378506809 hasConceptScore W4378506809C154945302 @default.
- W4378506809 hasConceptScore W4378506809C195807954 @default.
- W4378506809 hasConceptScore W4378506809C197046077 @default.
- W4378506809 hasConceptScore W4378506809C21959979 @default.
- W4378506809 hasConceptScore W4378506809C23123220 @default.
- W4378506809 hasConceptScore W4378506809C41008148 @default.
- W4378506809 hasConceptScore W4378506809C8797682 @default.
- W4378506809 hasLocation W43785068091 @default.
- W4378506809 hasOpenAccess W4378506809 @default.
- W4378506809 hasPrimaryLocation W43785068091 @default.
- W4378506809 hasRelatedWork W131325339 @default.
- W4378506809 hasRelatedWork W1532697597 @default.
- W4378506809 hasRelatedWork W1548492051 @default.
- W4378506809 hasRelatedWork W1788528807 @default.
- W4378506809 hasRelatedWork W1965510214 @default.
- W4378506809 hasRelatedWork W2155199173 @default.
- W4378506809 hasRelatedWork W2352425915 @default.
- W4378506809 hasRelatedWork W2371618206 @default.
- W4378506809 hasRelatedWork W2560564804 @default.
- W4378506809 hasRelatedWork W2800975405 @default.
- W4378506809 isParatext "false" @default.
- W4378506809 isRetracted "false" @default.
- W4378506809 workType "article" @default.