Matches in SemOpenAlex for { <https://semopenalex.org/work/W63346820> ?p ?o ?g. }
Showing items 1 to 72 of
72
with 100 items per page.
- W63346820 endingPage "315" @default.
- W63346820 startingPage "312" @default.
- W63346820 abstract "The problem of processing huge data sets has been studied for many years. Many valid technologies and methods for dealing with this problem have been developed. Random sampling [1] was proposed by Carlett to solve this problem in 1991, but it cannot work when the number of samples is over 32,000. K. C. Philip divided a big data set into some subsets which fit in memory at first, and then developed a classifier for each subset in parallel [2]. However, its accuracy is less than those processing a data set as a whole. SLIQ [3] and SPRINT [4], developed by IBM Almaden Research Center in 1996, are two important algorithms with the ability of dealing with disk-resident data directly. Their performance is equivalent to that of classical decision tree algorithms. Many other improved algorithms, such as CLOUDS [5] and ScalParC [6], are developed later. RainForest [7] is a framework for fast decision tree construction for large datasets. Its speed and effect are better than SPRINT in some cases. L. A. Ren, Q. He and Z. Z. Shi used hyper surface separation and HSC classification method to classify huge data sets and achieved a good performance [8, 9].Rough Set (RS) [10] is a valid mathematical theory to deal with imprecise, uncertain, and vague information. It has been applied in such fields as machine learning, data mining, intelligent data analyzing and control algorithm acquiring, etc, successfully since it was proposed by Professor Z. Pawlak in 1982. Attribute reduction is a key issue of rough set based knowledge acquisition. Many researchers proposed some algorithms for attribution reduction. These reduction algorithms can be classified into two categories: reduction without attribute order and reduction with attribute order. In 1992, A. Skowron proposed an algorithm for attribute reduction based on discernibility matrix. It's time complexity is t = O(2m × n2), and space complexity is s = O(n2 × m) (m is the number of attributes, n is the number of objects) [11]. In 1995, X. H. Hu improved Skowron's algorithm and proposed a new algorithm for attribute reduction with complexities of t = O(n2) × m3 and s = O(n × m) [12]. In 1996, H. S. Nguyen proposed an algorithm for attribute reduction by sorting decision table. It's complexities are t = O(m2 × n × log n) and s = O(n + m) [13].In 2002, G. Y. Wang proposed an algorithm for attribute reduction based on information entropy. It's complexities are t = O(m2 × n × log n) and s = O(n × m) [14]. In 2003, S. H. Liu proposed an algorithm for attribute reduction by sorting and partitioning universe. It's complexities are O(m2 × n × log n) and s = O(n × m). [15]. In 2001, using Skowron's discernibility matrix, J. Wang proposed an algorithm for attribute reduction based on attribute order. Its complexities are t = O(m × n2) and s = O(m × n2) [16]. In 2004, M. Zhao and J. Wang proposed an algorithm for attribute reduction with tree structure based on attribute order. Its complexities are t = O(m2 × n) and s = O(m × n) [17].However, the efficiency of these reduction algorithms in dealing with huge data sets is not high enough. They are not good enough for application in industry. There are two reasons: one is the time complexity, and the other is the space complexity. Therefore, it is still needed to develop higher efficient algorithm for knowledge reduction.Quick sort for a two dimension table is an important basic operation in data mining. In huge data processing based on rough set theory, it is a basic operation to divide a decision table into indiscernible classes. Many researchers deal with this problem using the quick sort method. Suppose that the data of a two dimension table is in uniform distribution, many researchers think that the average time complexity of quick sort for a two dimension table with m attributes and n objects is O(n × log n × m). Thus, the average time complexity for computing the positive region of a decision table will be no less than O(n × log n × m) since the time complexity of quick sort for a one dimension data with n elements is O(n×log n). However, we find that the average time complexity of sorting a two dimension table is only O(n × (log n + m)) [18]. When m > log n, O(n × (log n + m)) will be O(n × m) approximately.Divide and conquer method divides a complicated problem into simpler sub-problems with same structures iteratively and at last the sizes of the sub-problems will become small enough to be processed directly. Since the time complexity of sorting a two dimension table is just O(n × (m + log n)), and quick sort is a classic divide and conquer method, we may improve reduction methods of rough set theory using divide and conquer method.Based on this idea, we have a research plan for quick knowledge reduction based on divide and conquer method. We have two research frameworks: one is attribute order based, and the other one is without attribute order. (1) Quick knowledge reduction based on attribute order.In some huge databases, the number of attributes is small and it is easy for domain experts to provide an attribute order. In this case, attribute reduction algorithms based on attribute order will be preferable. Combining the divide and conquer method, a quick attribute reduction algorithm based on attribute order is developed. Its time complexity is O(m × n × (m + log n)), and its space complexity is O(n +m) [19]. (2) Quick knowledge reduction without attribute order.In some huge databases, the number of attributes is big, even over 1000. It is very difficult for domain experts to provide an attribute order. In this case, attribute reduction algorithm without attribute order will be needed. Although many algorithms are proposed for such applications, their complexities are too high to be put into industry application. Combining the divide and conquer method, we will develop new knowledge reduction algorithms without attribute order. The aim of complexities of such algorithms will be: t = O(m × n × (m + log n)) and s = O(n +m). In this research framework, we have had some good results on attribute core calculation, its complexities are t = O(m × n), s = O(n + m) [20]." @default.
- W63346820 created "2016-06-24" @default.
- W63346820 creator A5031220156 @default.
- W63346820 creator A5035486573 @default.
- W63346820 date "2007-11-27" @default.
- W63346820 modified "2023-09-24" @default.
- W63346820 title "Quick Knowledge Reduction Based on Divide and Conquer Method in Huge Data Sets" @default.
- W63346820 cites W1553701556 @default.
- W63346820 cites W1576962511 @default.
- W63346820 cites W1582962693 @default.
- W63346820 cites W1964980811 @default.
- W63346820 cites W1971790955 @default.
- W63346820 cites W2538246427 @default.
- W63346820 cites W4255833381 @default.
- W63346820 doi "https://doi.org/10.1007/978-3-540-77046-6_39" @default.
- W63346820 hasPublicationYear "2007" @default.
- W63346820 type Work @default.
- W63346820 sameAs 63346820 @default.
- W63346820 citedByCount "4" @default.
- W63346820 countsByYear W633468202012 @default.
- W63346820 countsByYear W633468202014 @default.
- W63346820 countsByYear W633468202015 @default.
- W63346820 countsByYear W633468202019 @default.
- W63346820 crossrefType "book-chapter" @default.
- W63346820 hasAuthorship W63346820A5031220156 @default.
- W63346820 hasAuthorship W63346820A5035486573 @default.
- W63346820 hasBestOaLocation W633468201 @default.
- W63346820 hasConcept C111335779 @default.
- W63346820 hasConcept C11413529 @default.
- W63346820 hasConcept C124101348 @default.
- W63346820 hasConcept C2524010 @default.
- W63346820 hasConcept C33923547 @default.
- W63346820 hasConcept C41008148 @default.
- W63346820 hasConcept C71559656 @default.
- W63346820 hasConcept C80444323 @default.
- W63346820 hasConceptScore W63346820C111335779 @default.
- W63346820 hasConceptScore W63346820C11413529 @default.
- W63346820 hasConceptScore W63346820C124101348 @default.
- W63346820 hasConceptScore W63346820C2524010 @default.
- W63346820 hasConceptScore W63346820C33923547 @default.
- W63346820 hasConceptScore W63346820C41008148 @default.
- W63346820 hasConceptScore W63346820C71559656 @default.
- W63346820 hasConceptScore W63346820C80444323 @default.
- W63346820 hasLocation W633468201 @default.
- W63346820 hasOpenAccess W63346820 @default.
- W63346820 hasPrimaryLocation W633468201 @default.
- W63346820 hasRelatedWork W1502528314 @default.
- W63346820 hasRelatedWork W1556523051 @default.
- W63346820 hasRelatedWork W1560324125 @default.
- W63346820 hasRelatedWork W1575902893 @default.
- W63346820 hasRelatedWork W2021227709 @default.
- W63346820 hasRelatedWork W2028461443 @default.
- W63346820 hasRelatedWork W2072893536 @default.
- W63346820 hasRelatedWork W2100449850 @default.
- W63346820 hasRelatedWork W2107335359 @default.
- W63346820 hasRelatedWork W2140429572 @default.
- W63346820 hasRelatedWork W2252777243 @default.
- W63346820 hasRelatedWork W2260789725 @default.
- W63346820 hasRelatedWork W2351785945 @default.
- W63346820 hasRelatedWork W2356436120 @default.
- W63346820 hasRelatedWork W2536258236 @default.
- W63346820 hasRelatedWork W2567686633 @default.
- W63346820 hasRelatedWork W2964132559 @default.
- W63346820 hasRelatedWork W2134857465 @default.
- W63346820 hasRelatedWork W2585158218 @default.
- W63346820 hasRelatedWork W2600666806 @default.
- W63346820 isParatext "false" @default.
- W63346820 isRetracted "false" @default.
- W63346820 magId "63346820" @default.
- W63346820 workType "book-chapter" @default.