Matches in SemOpenAlex for { <https://semopenalex.org/work/W367528235> ?p ?o ?g. }
- W367528235 abstract "The rapidly growing volume and complexity of modern databases makes the need for technologies to describe and summarise the information they contain increasingly important. Data mining is a process of extracting implicit, previously unknown and potentially useful patterns and relationships from data, and is widely used in industry and business applications. Rules characterise relationships among patterns in databases, and rule mining is one of the central tasks in data mining. There are fundamentally two categories of rules, namely association rules and classification rules. Traditionally, association rules are connected with transaction databases for market basket problems and classification rules are associated with relational databases for predictions. In this thesis, we will mainly focus on the use of association rules for predictions. An optimal rule set is a rule set that satisfies given optimality criteria. In this thesis we study two types of optimal rule sets, the informative association rule set and the optimal class association rule set, where the informative association rule set is used for market basket predictions and the class association rule set is used for the classification. A robust classification rule set is a rule set that is capable of providing more correct predictions than a traditional classification rule set on incomplete test data. Mining transaction databases for association rules usually generates a large number of rules, most of which are unnecessary when used for subsequent prediction. We define a rule set for a given transaction database that is significantly smaller than an association rule set but makes the same predictions as the complete association rule set. We call this rule set the informative rule set. The informative rule set is not constrainedto particular target items; and it is smaller than the non-redundant association rule set. We characterise the relationships between the informative rule set and the non-redundant association rule set. We present an algorithm to directly generate the informative rule set without generating all frequent itemsets first, and that accesses databases less often than other direct methods. We show experimentally that the informative rule set is much smaller than both the association rule set and the non-redundant association rule set for a given database, and that it can be generated more efficiently. In addition, we discuss a new unsupervised discretization method to deal with numerical attributes in general association rule mining without target specification. Based on the analysis of the strengths and weaknesses of two commonly used unsupervised numerical attribute discretization methods, we present an adaptive numerical attribute merging algorithm that is shown better than both methods in general association rule mining. Relational databases are usually denser than transaction databases, so mining on them for class association rules, which is a set of association rules whose consequences are classes, may be difficult due to the combinatorial explosion. Based on the analysis of the prediction mechanism, we define an optimal class association rule set to be a subset of the complete class association rule set containing all potentially predictive rules. Using this rule set instead of the complete class association rule set we can avoid redundant computation that would otherwise be required for mining predictive association rules and hence improve the efficiency of the mining process significantly. We present an efficient algorithm for mining optimal class association rule sets using upward closure properties to prune weak rules before they are actually generated. We show theoretically the efficiency of the proposed algorithm will be greater than Apriori on dense databases, and confirm experimentally that it generates an optimal class association rule set, which is very much smaller than a complete class association rule set, in significantly less time than generating the complete class association rule set by Apriori. Traditional classification rule sets perform badly on test data that are not as complete as the training data. We study the problem of discovering more robust rule sets, i.e. we say a rule is more robust than another rule set if it is able to make more accurate predictions on test data with missing attribute values. We reveal a hierarchy of k-optimal rule sets where a k-optimal rule set with a large k is more robust, and they are more robust than a traditional classification rule set. We introduce two methods to find k-optimal rule sets, i.e. an optimal association rule mining approach and a heuristic approximate approach. We show experimentally that a k-optimal rule set generated from the optimal association rule mining approach performs better than that from the heuristic approximate approach and both rule sets perform significantly better than a typical classification rule set (C4.5Rules) on incomplete test data. Finally, we summarise the work discussed in this thesis, and suggest some future research directions." @default.
- W367528235 created "2016-06-24" @default.
- W367528235 creator A5012177739 @default.
- W367528235 date "2002-01-01" @default.
- W367528235 modified "2023-09-28" @default.
- W367528235 title "Optimal and Robust Rule Set Generation" @default.
- W367528235 cites W1482472192 @default.
- W367528235 cites W1483679765 @default.
- W367528235 cites W1485430879 @default.
- W367528235 cites W1496189887 @default.
- W367528235 cites W1506285740 @default.
- W367528235 cites W1510033442 @default.
- W367528235 cites W1520890006 @default.
- W367528235 cites W152264167 @default.
- W367528235 cites W1523989055 @default.
- W367528235 cites W1524454721 @default.
- W367528235 cites W1536551311 @default.
- W367528235 cites W1539166981 @default.
- W367528235 cites W1539171341 @default.
- W367528235 cites W1553696291 @default.
- W367528235 cites W1554325007 @default.
- W367528235 cites W1556507321 @default.
- W367528235 cites W1567313600 @default.
- W367528235 cites W1584197556 @default.
- W367528235 cites W1585397009 @default.
- W367528235 cites W1585743408 @default.
- W367528235 cites W1593058185 @default.
- W367528235 cites W1593431730 @default.
- W367528235 cites W1594031697 @default.
- W367528235 cites W1597910678 @default.
- W367528235 cites W1600769580 @default.
- W367528235 cites W1671614046 @default.
- W367528235 cites W1678889691 @default.
- W367528235 cites W173543053 @default.
- W367528235 cites W1817561967 @default.
- W367528235 cites W187660804 @default.
- W367528235 cites W1918423381 @default.
- W367528235 cites W1927345150 @default.
- W367528235 cites W1948199107 @default.
- W367528235 cites W1969483458 @default.
- W367528235 cites W1986967485 @default.
- W367528235 cites W1988790447 @default.
- W367528235 cites W1996249351 @default.
- W367528235 cites W1999011285 @default.
- W367528235 cites W1999138184 @default.
- W367528235 cites W2000106226 @default.
- W367528235 cites W2000473687 @default.
- W367528235 cites W2004748427 @default.
- W367528235 cites W2013017122 @default.
- W367528235 cites W2014917754 @default.
- W367528235 cites W2023612196 @default.
- W367528235 cites W2030969394 @default.
- W367528235 cites W2037965136 @default.
- W367528235 cites W2040158750 @default.
- W367528235 cites W2042875144 @default.
- W367528235 cites W2045816045 @default.
- W367528235 cites W2054784808 @default.
- W367528235 cites W2064853889 @default.
- W367528235 cites W2067642555 @default.
- W367528235 cites W2081869978 @default.
- W367528235 cites W2084812512 @default.
- W367528235 cites W2085638007 @default.
- W367528235 cites W2089967664 @default.
- W367528235 cites W2094974204 @default.
- W367528235 cites W2097800052 @default.
- W367528235 cites W2112076978 @default.
- W367528235 cites W2112122409 @default.
- W367528235 cites W2115107787 @default.
- W367528235 cites W2117812871 @default.
- W367528235 cites W2120943950 @default.
- W367528235 cites W2125055259 @default.
- W367528235 cites W2125227861 @default.
- W367528235 cites W2126400629 @default.
- W367528235 cites W2136000097 @default.
- W367528235 cites W2136003390 @default.
- W367528235 cites W2140129471 @default.
- W367528235 cites W2140190241 @default.
- W367528235 cites W2141115288 @default.
- W367528235 cites W2147169507 @default.
- W367528235 cites W2149706766 @default.
- W367528235 cites W2152817912 @default.
- W367528235 cites W2154565402 @default.
- W367528235 cites W2154642793 @default.
- W367528235 cites W2156026066 @default.
- W367528235 cites W2156754096 @default.
- W367528235 cites W2158454296 @default.
- W367528235 cites W2160605849 @default.
- W367528235 cites W2168796272 @default.
- W367528235 cites W2170556630 @default.
- W367528235 cites W2172186225 @default.
- W367528235 cites W2210278139 @default.
- W367528235 cites W2535884801 @default.
- W367528235 cites W2613161123 @default.
- W367528235 cites W2766736793 @default.
- W367528235 cites W2912934387 @default.
- W367528235 cites W2999729612 @default.
- W367528235 cites W3028232160 @default.
- W367528235 cites W32120410 @default.
- W367528235 cites W60969841 @default.
- W367528235 cites W62371467 @default.