Matches in SemOpenAlex for { <https://semopenalex.org/work/W2138026602> ?p ?o ?g. }
- W2138026602 abstract "The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for several reasons: an XML schema may be very lax (e.g., to accommodate the flexibility needed to represent collections of documents in RSS1 feeds), a schema may be large and different subsets used for different documents (e.g., this is common in industry standards like UBL2), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). A schema alone may not provide sufficient information for many data management tasks that require knowledge of the actual structure of the collection. Web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?) Dealing with the (highly variable) structure of such web collections poses additional challenges. This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogenous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX’s light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.1http://www.rss-specifications.com/; 2http://oasis-open.org/committees/ubl/" @default.
- W2138026602 created "2016-06-24" @default.
- W2138026602 creator A5021938785 @default.
- W2138026602 date "2008-01-01" @default.
- W2138026602 modified "2023-09-28" @default.
- W2138026602 title "Describex: a framework for exploring and querying xml web collections" @default.
- W2138026602 cites W1481200545 @default.
- W2138026602 cites W1482220696 @default.
- W2138026602 cites W1511201126 @default.
- W2138026602 cites W1515993376 @default.
- W2138026602 cites W1524459778 @default.
- W2138026602 cites W1527806945 @default.
- W2138026602 cites W1530044031 @default.
- W2138026602 cites W1538827254 @default.
- W2138026602 cites W1547831574 @default.
- W2138026602 cites W1554673294 @default.
- W2138026602 cites W1601989522 @default.
- W2138026602 cites W1971729277 @default.
- W2138026602 cites W1975887898 @default.
- W2138026602 cites W1983604014 @default.
- W2138026602 cites W1988668397 @default.
- W2138026602 cites W2002089154 @default.
- W2138026602 cites W2025695428 @default.
- W2138026602 cites W2030166143 @default.
- W2138026602 cites W2035902703 @default.
- W2138026602 cites W2046904770 @default.
- W2138026602 cites W2048867746 @default.
- W2138026602 cites W2061884758 @default.
- W2138026602 cites W2068361557 @default.
- W2138026602 cites W2081052629 @default.
- W2138026602 cites W2096768150 @default.
- W2138026602 cites W2097396510 @default.
- W2138026602 cites W2099552474 @default.
- W2138026602 cites W2099686928 @default.
- W2138026602 cites W2102696193 @default.
- W2138026602 cites W2105748890 @default.
- W2138026602 cites W2110089831 @default.
- W2138026602 cites W2110459974 @default.
- W2138026602 cites W2110474297 @default.
- W2138026602 cites W2115221160 @default.
- W2138026602 cites W2119677080 @default.
- W2138026602 cites W2121800881 @default.
- W2138026602 cites W2122353940 @default.
- W2138026602 cites W2122530852 @default.
- W2138026602 cites W2122610012 @default.
- W2138026602 cites W2122956021 @default.
- W2138026602 cites W2123627092 @default.
- W2138026602 cites W2124073911 @default.
- W2138026602 cites W2124325155 @default.
- W2138026602 cites W2124391840 @default.
- W2138026602 cites W2132546253 @default.
- W2138026602 cites W2134356404 @default.
- W2138026602 cites W2134826526 @default.
- W2138026602 cites W2135282325 @default.
- W2138026602 cites W2136016195 @default.
- W2138026602 cites W2139475358 @default.
- W2138026602 cites W2141280272 @default.
- W2138026602 cites W2142560248 @default.
- W2138026602 cites W2142876691 @default.
- W2138026602 cites W2143100570 @default.
- W2138026602 cites W2145186067 @default.
- W2138026602 cites W2151335566 @default.
- W2138026602 cites W2151358465 @default.
- W2138026602 cites W2155478465 @default.
- W2138026602 cites W2156184447 @default.
- W2138026602 cites W2156664231 @default.
- W2138026602 cites W2159701549 @default.
- W2138026602 cites W2159793945 @default.
- W2138026602 cites W2160442532 @default.
- W2138026602 cites W2161728285 @default.
- W2138026602 cites W2168506731 @default.
- W2138026602 cites W2169131377 @default.
- W2138026602 cites W2169424245 @default.
- W2138026602 cites W2171125533 @default.
- W2138026602 cites W2172007538 @default.
- W2138026602 cites W2202508117 @default.
- W2138026602 cites W2294859229 @default.
- W2138026602 cites W2295596515 @default.
- W2138026602 cites W85188112 @default.
- W2138026602 hasPublicationYear "2008" @default.
- W2138026602 type Work @default.
- W2138026602 sameAs 2138026602 @default.
- W2138026602 citedByCount "0" @default.
- W2138026602 crossrefType "dissertation" @default.
- W2138026602 hasAuthorship W2138026602A5021938785 @default.
- W2138026602 hasConcept C136764020 @default.
- W2138026602 hasConcept C23123220 @default.
- W2138026602 hasConcept C2385561 @default.
- W2138026602 hasConcept C2780213375 @default.
- W2138026602 hasConcept C34716815 @default.
- W2138026602 hasConcept C41008148 @default.
- W2138026602 hasConcept C52146309 @default.
- W2138026602 hasConcept C55348073 @default.
- W2138026602 hasConcept C68699486 @default.
- W2138026602 hasConcept C77088390 @default.
- W2138026602 hasConcept C8797682 @default.
- W2138026602 hasConceptScore W2138026602C136764020 @default.
- W2138026602 hasConceptScore W2138026602C23123220 @default.
- W2138026602 hasConceptScore W2138026602C2385561 @default.
- W2138026602 hasConceptScore W2138026602C2780213375 @default.