Matches in SemOpenAlex for { <https://semopenalex.org/work/W2601386321> ?p ?o ?g. }
- W2601386321 abstract "Data mining methods seek to discover unexpected and interesting regularities, called patterns, in presented data sets. However, the methods often return a collection of patterns for any data set, even a random one. Statistical significance testing can be applied in these scenarios to select the surprising patterns that do not appear as clearly in random data. As each pattern is tested for significance, a set of statistical hypotheses are considered simultaneously. The multiple comparison of several hypotheses simultaneously is called multiple hypothesis testing, and special treatment is required to adequately control the probability of falsely declaring a pattern statistically significant. However, the traditional methods for multiple hypothesis testing can not be used in data mining scenarios, because these methods do not consider the problem of varying set of hypotheses, which is inherent in data mining.This thesis provides an introduction to the problem and reviews some published work on the subject. The focus is in multiple hypothesis testing and specifically in data mining. The problems with traditional multiple hypothesis testing methods in data mining scenarios are discussed, and a solution to these problems is presented. The solution uses randomization, which involves drawing samples of random data sets and using the data mining algorithm with them. The results on the random data sets are then compared with the results on the original data set. Randomization is introduced and discussed in general, and possible randomization schemes in different data mining scenarios are presented. The solution is applied in iterative data mining and biclustering scenarios. Experiments are carried out to display the utility in these applications.; Tiedonlouhinnan menetelmilla pyritaan loytamaan annetusta aineistosta yllattavia ja mielenkiintoisia saannonmukaisuuksia, joita kutsutaan hahmoiksi. Useat menetelmat kuitenkin loytavat hahmoja kaikista aineistoista, jopa taysin satunnaisista. Naissa tilanteissa voidaan kayttaa tilastollista testausta valitsemaan yllattavat hahmot, jotka eivat esiinny yhta vahvasti satunnaisessa aineistossa. Monen hahmon tilastollista merkittavyytta testatessa kasitellaan samalla yhdenaikaisesti joukkoa tilastollisia hypoteeseja. Usean hypoteesin yhdenaikaista testausta kutsutaan monen hypoteesin testaamiseksi, joka vaatii erityistoimenpiteita, jotta vaarien johtopaatosten todennakoisyytta voidaan hallita. Kuitenkaan tyypillisia monen hypoteesin testausmenetelmia ei voida kayttaa tiedonlouhinnassa, koska ne eivat ota huomioon tiedonlouhinnassa tyypillista vaihtelevan hypoteesijoukon ongelmaa.Tama vaitoskirja esittelee ongelman ja tarkastelee aiheeseen liittyvia julkaisuja. Kirja keskittyy monen hypoteesin testaamiseen erityisesti tiedonlouhinnan tilanteissa. Tyypillisten monen hypoteesin testaamiseen kaytettavien menetelmien ongelmia tiedonlouhinnassa kasitellaan, ja ongelmiin esitetaan ratkaisu. Tama perustuu satunnaistukseen, jossa luodaan satunnaisia aineistoja ja kaytetaan tiedonlouhinnan…" @default.
- W2601386321 created "2017-04-07" @default.
- W2601386321 creator A5056086594 @default.
- W2601386321 date "2012-01-01" @default.
- W2601386321 modified "2023-09-26" @default.
- W2601386321 title "Multiple hypothesis testing in data mining" @default.
- W2601386321 cites W118481696 @default.
- W2601386321 cites W144928601 @default.
- W2601386321 cites W1489520413 @default.
- W2601386321 cites W1493217831 @default.
- W2601386321 cites W1508174699 @default.
- W2601386321 cites W151863654 @default.
- W2601386321 cites W1519389640 @default.
- W2601386321 cites W1533169541 @default.
- W2601386321 cites W1549565124 @default.
- W2601386321 cites W1553696291 @default.
- W2601386321 cites W1561581337 @default.
- W2601386321 cites W1573579329 @default.
- W2601386321 cites W1585610988 @default.
- W2601386321 cites W1585646276 @default.
- W2601386321 cites W1596411313 @default.
- W2601386321 cites W1596515083 @default.
- W2601386321 cites W1600293573 @default.
- W2601386321 cites W1965138864 @default.
- W2601386321 cites W196542726 @default.
- W2601386321 cites W1969192912 @default.
- W2601386321 cites W1976352460 @default.
- W2601386321 cites W1978036582 @default.
- W2601386321 cites W1978690156 @default.
- W2601386321 cites W1981446844 @default.
- W2601386321 cites W2002495820 @default.
- W2601386321 cites W2009380894 @default.
- W2601386321 cites W2011332377 @default.
- W2601386321 cites W20184837 @default.
- W2601386321 cites W2018490949 @default.
- W2601386321 cites W2021542402 @default.
- W2601386321 cites W2029817244 @default.
- W2601386321 cites W2030764200 @default.
- W2601386321 cites W2035843433 @default.
- W2601386321 cites W2036328877 @default.
- W2601386321 cites W2037586880 @default.
- W2601386321 cites W2040820996 @default.
- W2601386321 cites W2044189186 @default.
- W2601386321 cites W2051570650 @default.
- W2601386321 cites W2054658115 @default.
- W2601386321 cites W2056760934 @default.
- W2601386321 cites W2058846847 @default.
- W2601386321 cites W2058849889 @default.
- W2601386321 cites W2064787891 @default.
- W2601386321 cites W2066277072 @default.
- W2601386321 cites W20667085 @default.
- W2601386321 cites W2070589300 @default.
- W2601386321 cites W20722260 @default.
- W2601386321 cites W2073459066 @default.
- W2601386321 cites W2077562320 @default.
- W2601386321 cites W2077884493 @default.
- W2601386321 cites W2084491844 @default.
- W2601386321 cites W2090257125 @default.
- W2601386321 cites W2093463665 @default.
- W2601386321 cites W2094613751 @default.
- W2601386321 cites W2099019567 @default.
- W2601386321 cites W2099107563 @default.
- W2601386321 cites W2101111945 @default.
- W2601386321 cites W2101267973 @default.
- W2601386321 cites W2101460669 @default.
- W2601386321 cites W2102862543 @default.
- W2601386321 cites W2105494575 @default.
- W2601386321 cites W2110065044 @default.
- W2601386321 cites W2110893883 @default.
- W2601386321 cites W2111317215 @default.
- W2601386321 cites W2112090702 @default.
- W2601386321 cites W2115482638 @default.
- W2601386321 cites W2117730787 @default.
- W2601386321 cites W2117897510 @default.
- W2601386321 cites W2119043225 @default.
- W2601386321 cites W2119878361 @default.
- W2601386321 cites W2120148445 @default.
- W2601386321 cites W2121044470 @default.
- W2601386321 cites W2125179362 @default.
- W2601386321 cites W2125905177 @default.
- W2601386321 cites W2126449841 @default.
- W2601386321 cites W2128906841 @default.
- W2601386321 cites W2128967345 @default.
- W2601386321 cites W2129830652 @default.
- W2601386321 cites W2130426318 @default.
- W2601386321 cites W2135194391 @default.
- W2601386321 cites W2138180048 @default.
- W2601386321 cites W2138200515 @default.
- W2601386321 cites W2138309709 @default.
- W2601386321 cites W2138660495 @default.
- W2601386321 cites W2142827986 @default.
- W2601386321 cites W2144544802 @default.
- W2601386321 cites W2146008005 @default.
- W2601386321 cites W2148606196 @default.
- W2601386321 cites W2151622806 @default.
- W2601386321 cites W2152668278 @default.
- W2601386321 cites W2153411099 @default.
- W2601386321 cites W2156026066 @default.
- W2601386321 cites W2156737316 @default.
- W2601386321 cites W2159123127 @default.