Matches in SemOpenAlex for { <https://semopenalex.org/work/W4294877672> ?p ?o ?g. }
- W4294877672 endingPage "118718" @default.
- W4294877672 startingPage "118718" @default.
- W4294877672 abstract "Companies have an increasing access to very large datasets within their domain. Analysing these datasets often requires the application of feature selection techniques in order to reduce the dimensionality of the data and prioritize features for downstream knowledge generation tasks. Effective feature selection is a key part of clustering, regression and classification. It presents a myriad of opportunities to improve the machine learning pipeline: eliminating redundant and irrelevant features, reducing model over-fitting, faster model training times and more explainable models. By contrast, and despite the widespread availability and use of categorical data in practice, feature selection for categorical and/or mixed data has received relatively little attention in comparison to numerical data. Furthermore, existing feature selection methods for mixed data are sensitive to number of objects by having nonlinear time complexities with respect to number of objects. In this work, we propose a generic multiple association measure for mixed datasets and a novel feature selection algorithm that uses multiple association across features. Our algorithm is based upon the belief that the most representative chosen set of features should be as diverse and minimally dependent on each other as possible. The proposed algorithm formulates the problem of feature selection as an optimization problem, searching for the set of features that have minimum association amongst them. We present a generic multiple association measure and two associated feature selection algorithms: Naive and Greedy Feature Selection Algorithms called NFSA and GFSA, respectively. Our proposed GFSA algorithm is evaluated on 15 benchmark datasets, and compared to four existing state of the art feature selection techniques. We demonstrate that our approach provides comparable downstream classification performance outperforming other leading techniques on several datasets. Both time complexity analysis and experimental results show that our proposed algorithm significantly reduces the processing time required for unsupervised feature selection algorithms especially for long datasets which have a huge number of objects, whilst also yielding comparable clustering and classification performance. On the other hand, we do not recommend our approach for wide datasets where the number of features is huge with respect to the number of objects e.g., image, text and genome datasets." @default.
- W4294877672 created "2022-09-07" @default.
- W4294877672 creator A5028593139 @default.
- W4294877672 creator A5083112290 @default.
- W4294877672 creator A5083267470 @default.
- W4294877672 creator A5086973117 @default.
- W4294877672 date "2023-02-01" @default.
- W4294877672 modified "2023-09-28" @default.
- W4294877672 title "A multiple association-based unsupervised feature selection algorithm for mixed data sets" @default.
- W4294877672 cites W1963763787 @default.
- W4294877672 cites W1965680102 @default.
- W4294877672 cites W1983150145 @default.
- W4294877672 cites W2014596061 @default.
- W4294877672 cites W2036576956 @default.
- W4294877672 cites W2051317694 @default.
- W4294877672 cites W2060542593 @default.
- W4294877672 cites W2078841894 @default.
- W4294877672 cites W2079361215 @default.
- W4294877672 cites W2084376194 @default.
- W4294877672 cites W2085487226 @default.
- W4294877672 cites W2126693856 @default.
- W4294877672 cites W2128507168 @default.
- W4294877672 cites W2145836846 @default.
- W4294877672 cites W2149230623 @default.
- W4294877672 cites W2151660600 @default.
- W4294877672 cites W2158933803 @default.
- W4294877672 cites W2165700458 @default.
- W4294877672 cites W2344681634 @default.
- W4294877672 cites W2474269906 @default.
- W4294877672 cites W2506743715 @default.
- W4294877672 cites W2550999023 @default.
- W4294877672 cites W2604452321 @default.
- W4294877672 cites W2626280777 @default.
- W4294877672 cites W2740555475 @default.
- W4294877672 cites W2781829441 @default.
- W4294877672 cites W2788471974 @default.
- W4294877672 cites W2911627187 @default.
- W4294877672 cites W2930181195 @default.
- W4294877672 cites W2996149946 @default.
- W4294877672 cites W2998490552 @default.
- W4294877672 cites W3027275488 @default.
- W4294877672 cites W3120493664 @default.
- W4294877672 cites W3159590996 @default.
- W4294877672 cites W3204294171 @default.
- W4294877672 cites W628583573 @default.
- W4294877672 cites W2273892608 @default.
- W4294877672 doi "https://doi.org/10.1016/j.eswa.2022.118718" @default.
- W4294877672 hasPublicationYear "2023" @default.
- W4294877672 type Work @default.
- W4294877672 citedByCount "3" @default.
- W4294877672 countsByYear W42948776722023 @default.
- W4294877672 crossrefType "journal-article" @default.
- W4294877672 hasAuthorship W4294877672A5028593139 @default.
- W4294877672 hasAuthorship W4294877672A5083112290 @default.
- W4294877672 hasAuthorship W4294877672A5083267470 @default.
- W4294877672 hasAuthorship W4294877672A5086973117 @default.
- W4294877672 hasConcept C111030470 @default.
- W4294877672 hasConcept C11413529 @default.
- W4294877672 hasConcept C119857082 @default.
- W4294877672 hasConcept C124101348 @default.
- W4294877672 hasConcept C13280743 @default.
- W4294877672 hasConcept C138885662 @default.
- W4294877672 hasConcept C148483581 @default.
- W4294877672 hasConcept C153180895 @default.
- W4294877672 hasConcept C154945302 @default.
- W4294877672 hasConcept C185798385 @default.
- W4294877672 hasConcept C205649164 @default.
- W4294877672 hasConcept C2776401178 @default.
- W4294877672 hasConcept C41008148 @default.
- W4294877672 hasConcept C41895202 @default.
- W4294877672 hasConcept C51823790 @default.
- W4294877672 hasConcept C5274069 @default.
- W4294877672 hasConcept C58489278 @default.
- W4294877672 hasConcept C73555534 @default.
- W4294877672 hasConcept C81917197 @default.
- W4294877672 hasConceptScore W4294877672C111030470 @default.
- W4294877672 hasConceptScore W4294877672C11413529 @default.
- W4294877672 hasConceptScore W4294877672C119857082 @default.
- W4294877672 hasConceptScore W4294877672C124101348 @default.
- W4294877672 hasConceptScore W4294877672C13280743 @default.
- W4294877672 hasConceptScore W4294877672C138885662 @default.
- W4294877672 hasConceptScore W4294877672C148483581 @default.
- W4294877672 hasConceptScore W4294877672C153180895 @default.
- W4294877672 hasConceptScore W4294877672C154945302 @default.
- W4294877672 hasConceptScore W4294877672C185798385 @default.
- W4294877672 hasConceptScore W4294877672C205649164 @default.
- W4294877672 hasConceptScore W4294877672C2776401178 @default.
- W4294877672 hasConceptScore W4294877672C41008148 @default.
- W4294877672 hasConceptScore W4294877672C41895202 @default.
- W4294877672 hasConceptScore W4294877672C51823790 @default.
- W4294877672 hasConceptScore W4294877672C5274069 @default.
- W4294877672 hasConceptScore W4294877672C58489278 @default.
- W4294877672 hasConceptScore W4294877672C73555534 @default.
- W4294877672 hasConceptScore W4294877672C81917197 @default.
- W4294877672 hasFunder F4320332999 @default.
- W4294877672 hasFunder F4320335254 @default.
- W4294877672 hasLocation W42948776721 @default.
- W4294877672 hasOpenAccess W4294877672 @default.