Matches in SemOpenAlex for { <https://semopenalex.org/work/W3021482264> ?p ?o ?g. }
Showing items 1 to 49 of
49
with 100 items per page.
- W3021482264 abstract "The class imbalanced problem can be considered one of the top problem in data mining today, as it is present in many real-world domains such as computer science, epidemiology, finance and so on. This has brought along a growth attention from both academia and industry. In this master thesis a critical study of the nature of the problem, the state-of-art solutions, an explanation of specif measure of performance and a real application of this problem has been carried out. In particular in the first part of the work a discussion about the problem of data imbalanced itself have been presented. We will analyze how the skewed distribution affects standard classification learning algorithms that are generally biased towards majority class. The reason is generally rooted inside the classifier's learning process structure, that it is often built with the prospective to optimize global metrics such as accuracy. This might lead to distorted conclusion about the performances i.e. a classifier that achieve an accuracy of 99% but that have a imbalanced ratio (fraction between majority class instances and minority ones) of 1, it is only classified all elements as belonging to the majority class, its performance is not so accurate. However the imbalanced distribution of the data is not the only factor that hider the learning task. Several data intrinsic characteristics have a robust impact on classification performance. An explanation of the problem of small disjuncts, the overlapping between classes, the presence of noise and borderline examples will be presented showing how they affected the learning process. In the second part of the thesis state of art solutions to these issues are presented. They can be divided into four groups: data level, algorithm level, cost-sensitive and ensembles methods. Data level approaches [cite{chawla2002smote},cite{han2005borderline},cite{bunkhumpornpat2009safe},cite{he2008adasyn},cite{yun2016automatic},cite{wilson1972asymptotic},cite{tomek1976two},cite{kubat1997addressing},cite{laurikkala2001improving}] use sampling methods to balance the class distribution. Resampling techniques can be categorized into three groups: undersampling, oversampling and hybrids. Algorithm level or internal approaches aim to improve the learning process, acting on the classifiers itself or on the training data. Cost sensitive approaches include data level, algorithm level or both mixed. The objective of this kind of solutions is to assign different misclassification cost to each class. As a combination of all these approaches there are the ensembles, whose approach consist in train several classifier and then aggregate their prediction in other to handle the overfitting problems. The two most famous ensemble techniques Bagging and Boosting. Finally for application it will be provided a case study developed during a intership in Reale Mutua Assicurazioni. In this final part several experiments will be conducted to cope with the imbalanced problems. Firstly the performances of standard classifier such SVM, logistic regression, decision tree and random forest will be analyzed underling the criticality of the different classifiers then their performances will be improve employing data level techniques such as SMOTE, ADASYN, RUS, ROS, Tomek link, Kmeans SMOTE [cite{last2017oversampling}]. The experimental results will show that decision tree classifier outperforms the others classifier in terms of F-measure when ROS is used as re" @default.
- W3021482264 created "2020-05-13" @default.
- W3021482264 creator A5081174684 @default.
- W3021482264 date "2019-12-02" @default.
- W3021482264 modified "2023-09-27" @default.
- W3021482264 title "classification of imbalanced data applied to insurance market" @default.
- W3021482264 hasPublicationYear "2019" @default.
- W3021482264 type Work @default.
- W3021482264 sameAs 3021482264 @default.
- W3021482264 citedByCount "0" @default.
- W3021482264 crossrefType "journal-article" @default.
- W3021482264 hasAuthorship W3021482264A5081174684 @default.
- W3021482264 hasConcept C119857082 @default.
- W3021482264 hasConcept C124101348 @default.
- W3021482264 hasConcept C154945302 @default.
- W3021482264 hasConcept C41008148 @default.
- W3021482264 hasConcept C95623464 @default.
- W3021482264 hasConceptScore W3021482264C119857082 @default.
- W3021482264 hasConceptScore W3021482264C124101348 @default.
- W3021482264 hasConceptScore W3021482264C154945302 @default.
- W3021482264 hasConceptScore W3021482264C41008148 @default.
- W3021482264 hasConceptScore W3021482264C95623464 @default.
- W3021482264 hasLocation W30214822641 @default.
- W3021482264 hasOpenAccess W3021482264 @default.
- W3021482264 hasPrimaryLocation W30214822641 @default.
- W3021482264 hasRelatedWork W1480663908 @default.
- W3021482264 hasRelatedWork W1570070578 @default.
- W3021482264 hasRelatedWork W2027347092 @default.
- W3021482264 hasRelatedWork W2137822999 @default.
- W3021482264 hasRelatedWork W2359478778 @default.
- W3021482264 hasRelatedWork W2368180796 @default.
- W3021482264 hasRelatedWork W2412629076 @default.
- W3021482264 hasRelatedWork W2742613066 @default.
- W3021482264 hasRelatedWork W2889487566 @default.
- W3021482264 hasRelatedWork W2909588969 @default.
- W3021482264 hasRelatedWork W2914589845 @default.
- W3021482264 hasRelatedWork W3005040103 @default.
- W3021482264 hasRelatedWork W3008103964 @default.
- W3021482264 hasRelatedWork W3035777675 @default.
- W3021482264 hasRelatedWork W3047645378 @default.
- W3021482264 hasRelatedWork W3135092014 @default.
- W3021482264 hasRelatedWork W3137577999 @default.
- W3021482264 hasRelatedWork W3154235440 @default.
- W3021482264 hasRelatedWork W3185165902 @default.
- W3021482264 hasRelatedWork W866358875 @default.
- W3021482264 isParatext "false" @default.
- W3021482264 isRetracted "false" @default.
- W3021482264 magId "3021482264" @default.
- W3021482264 workType "article" @default.