Matches in SemOpenAlex for { <https://semopenalex.org/work/W2163786136> ?p ?o ?g. }
- W2163786136 abstract "Humans often organize information by encoding it in structures that link together entities such as concepts, objects, properties etc. Among the various structures possible, hierarchies are commonly used. For instance, taxonomies of categories commonly employ hierarchies to indicate that one category type of another. The Yahoo! Web Directory and the Open Directory Project are two examples of large taxonomies where topics are hierarchically arranged. Hierarchies are also used to recursively decompose composite objects into their constituent parts. Examples of this are webpages that can be parsed and then represented as DOM-trees, where the DOM nodes correspond to sections of the webpages. In this thesis we argue that these hierarchical relationships between entities can be exploited to facilitate common data mining tasks defined upon them, like automated classification. Specifically, we show that the information encoded in these hierarchies can be reduced to constraints on class membership scores that can then be enforced as a post-processing step to enhance the accuracy of classification. We demonstrate our ideas and algorithms on three real-world tasks. First, we tackle the problem of classification into hierarchical taxonomies. We show how different taxonomy structures can be translated into constraints on the outputs of classifiers learned at the nodes of the hierarchy. In addition, we give algorithms to optimally enforce these constraints and show that this results in improved classification accuracy. In cases where the taxonomies are not available, we give an approach to automatically derive hierarchical relationships amongst a flat set of categories. Next, we work on the problem of detecting noisy (templated) parts of webpages. We give algorithms that rate each section of a webpage in terms of how templated it is. Then we show that smoothing the output of these template classifiers over the DOM-tree hierarchy improves the template detection performance of our system. Finally, we investigate the task of segmenting websites into topically cohesive regions. We define a framework and within it a set of measures that characterize good segmentations, and give an efficient algorithm to find the best segmentation within this framework. We formalize the problem of enforcing constraints on the outputs of classifiers as regularized isotonic or unimodal regression on rooted trees; these are generalizations of the classic isotonic regression problem. The nature of the constraints as well as the cost functions is different in each of the applications mentioned above. For all these formulations we give efficient algorithms to optimally smooth the classifier outputs. These novel formulations and algorithms might be of interest independent of the applications in this thesis." @default.
- W2163786136 created "2016-06-24" @default.
- W2163786136 creator A5058100924 @default.
- W2163786136 creator A5089487172 @default.
- W2163786136 date "2007-01-01" @default.
- W2163786136 modified "2023-09-28" @default.
- W2163786136 title "Enhanced classification through exploitation of hierarchical structures" @default.
- W2163786136 cites W1490760466 @default.
- W2163786136 cites W1493526108 @default.
- W2163786136 cites W1502424938 @default.
- W2163786136 cites W1503141391 @default.
- W2163786136 cites W1510526001 @default.
- W2163786136 cites W1522292060 @default.
- W2163786136 cites W1523949738 @default.
- W2163786136 cites W1538524459 @default.
- W2163786136 cites W1550206324 @default.
- W2163786136 cites W1559013041 @default.
- W2163786136 cites W1570978137 @default.
- W2163786136 cites W1574845294 @default.
- W2163786136 cites W1578372979 @default.
- W2163786136 cites W1588401315 @default.
- W2163786136 cites W1592327732 @default.
- W2163786136 cites W1602492977 @default.
- W2163786136 cites W1606091631 @default.
- W2163786136 cites W1620204465 @default.
- W2163786136 cites W1636244751 @default.
- W2163786136 cites W1673310716 @default.
- W2163786136 cites W1676820704 @default.
- W2163786136 cites W1699498167 @default.
- W2163786136 cites W178180052 @default.
- W2163786136 cites W180238871 @default.
- W2163786136 cites W1909864473 @default.
- W2163786136 cites W1965490077 @default.
- W2163786136 cites W1965555277 @default.
- W2163786136 cites W1971784203 @default.
- W2163786136 cites W1979032160 @default.
- W2163786136 cites W1984953641 @default.
- W2163786136 cites W1989338554 @default.
- W2163786136 cites W1995233974 @default.
- W2163786136 cites W1999059106 @default.
- W2163786136 cites W2002932921 @default.
- W2163786136 cites W2005124845 @default.
- W2163786136 cites W2006476860 @default.
- W2163786136 cites W2006560229 @default.
- W2163786136 cites W2012161809 @default.
- W2163786136 cites W2012514949 @default.
- W2163786136 cites W2014566476 @default.
- W2163786136 cites W2019363670 @default.
- W2163786136 cites W2020842694 @default.
- W2163786136 cites W2031841099 @default.
- W2163786136 cites W2038959058 @default.
- W2163786136 cites W2040075907 @default.
- W2163786136 cites W2040672759 @default.
- W2163786136 cites W2040870580 @default.
- W2163786136 cites W2046314114 @default.
- W2163786136 cites W2049633694 @default.
- W2163786136 cites W2052142057 @default.
- W2163786136 cites W2053606794 @default.
- W2163786136 cites W2063862666 @default.
- W2163786136 cites W2064818667 @default.
- W2163786136 cites W2065168033 @default.
- W2163786136 cites W2066680326 @default.
- W2163786136 cites W2072489225 @default.
- W2163786136 cites W2076008912 @default.
- W2163786136 cites W2078288474 @default.
- W2163786136 cites W2083598336 @default.
- W2163786136 cites W2087303323 @default.
- W2163786136 cites W2088906645 @default.
- W2163786136 cites W2090634555 @default.
- W2163786136 cites W2095897464 @default.
- W2163786136 cites W2096765209 @default.
- W2163786136 cites W2097089247 @default.
- W2163786136 cites W2097645701 @default.
- W2163786136 cites W2100445067 @default.
- W2163786136 cites W2100990314 @default.
- W2163786136 cites W2101711129 @default.
- W2163786136 cites W2102524069 @default.
- W2163786136 cites W2103723258 @default.
- W2163786136 cites W2105842272 @default.
- W2163786136 cites W2107008379 @default.
- W2163786136 cites W2107496976 @default.
- W2163786136 cites W2109760176 @default.
- W2163786136 cites W2111446078 @default.
- W2163786136 cites W2112104357 @default.
- W2163786136 cites W2113709557 @default.
- W2163786136 cites W2115082394 @default.
- W2163786136 cites W2115620960 @default.
- W2163786136 cites W2117209866 @default.
- W2163786136 cites W2122465391 @default.
- W2163786136 cites W2124436456 @default.
- W2163786136 cites W2124776405 @default.
- W2163786136 cites W2125055259 @default.
- W2163786136 cites W2125282628 @default.
- W2163786136 cites W2126631147 @default.
- W2163786136 cites W2126751256 @default.
- W2163786136 cites W2127218421 @default.
- W2163786136 cites W2131687179 @default.
- W2163786136 cites W2131840460 @default.
- W2163786136 cites W2132827946 @default.
- W2163786136 cites W2133814403 @default.