Matches in SemOpenAlex for { <https://semopenalex.org/work/W938128466> ?p ?o ?g. }
- W938128466 abstract "In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets. In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples. Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature. In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line." @default.
- W938128466 created "2016-06-24" @default.
- W938128466 creator A5018249685 @default.
- W938128466 creator A5044971917 @default.
- W938128466 date "2015-01-01" @default.
- W938128466 modified "2023-09-27" @default.
- W938128466 title "Optimal Data Distributions in Machine Learning" @default.
- W938128466 cites W116902681 @default.
- W938128466 cites W1493730910 @default.
- W938128466 cites W152645600 @default.
- W938128466 cites W1544681709 @default.
- W938128466 cites W1585385982 @default.
- W938128466 cites W1617650991 @default.
- W938128466 cites W1680579736 @default.
- W938128466 cites W1751719206 @default.
- W938128466 cites W1853837125 @default.
- W938128466 cites W189742998 @default.
- W938128466 cites W1966026565 @default.
- W938128466 cites W1977591411 @default.
- W938128466 cites W1978380814 @default.
- W938128466 cites W1987692590 @default.
- W938128466 cites W1993695976 @default.
- W938128466 cites W1994389483 @default.
- W938128466 cites W2014268383 @default.
- W938128466 cites W2024244622 @default.
- W938128466 cites W2026386069 @default.
- W938128466 cites W203025129 @default.
- W938128466 cites W2032536435 @default.
- W938128466 cites W2034368206 @default.
- W938128466 cites W2034920727 @default.
- W938128466 cites W2048679005 @default.
- W938128466 cites W2052664531 @default.
- W938128466 cites W2062179223 @default.
- W938128466 cites W2068039256 @default.
- W938128466 cites W2081850149 @default.
- W938128466 cites W2099129729 @default.
- W938128466 cites W2102689555 @default.
- W938128466 cites W2103851188 @default.
- W938128466 cites W2104094955 @default.
- W938128466 cites W2106162769 @default.
- W938128466 cites W2107298017 @default.
- W938128466 cites W2108263314 @default.
- W938128466 cites W2111355007 @default.
- W938128466 cites W2111362445 @default.
- W938128466 cites W2112483442 @default.
- W938128466 cites W2114338449 @default.
- W938128466 cites W2118020555 @default.
- W938128466 cites W2121495423 @default.
- W938128466 cites W2121981798 @default.
- W938128466 cites W2122244877 @default.
- W938128466 cites W2126442689 @default.
- W938128466 cites W2129851978 @default.
- W938128466 cites W2131953535 @default.
- W938128466 cites W2132585078 @default.
- W938128466 cites W2135563396 @default.
- W938128466 cites W2146871184 @default.
- W938128466 cites W2152825437 @default.
- W938128466 cites W2153635508 @default.
- W938128466 cites W2156324002 @default.
- W938128466 cites W2158108973 @default.
- W938128466 cites W2160977456 @default.
- W938128466 cites W2162651021 @default.
- W938128466 cites W2163302275 @default.
- W938128466 cites W2165874743 @default.
- W938128466 cites W2167879950 @default.
- W938128466 cites W2168183996 @default.
- W938128466 cites W2170612786 @default.
- W938128466 cites W2296319761 @default.
- W938128466 cites W2811380766 @default.
- W938128466 cites W2962998867 @default.
- W938128466 cites W3120740533 @default.
- W938128466 cites W3127518054 @default.
- W938128466 cites W2311233238 @default.
- W938128466 cites W3149820480 @default.
- W938128466 doi "https://doi.org/10.7907/z9dr2sd5." @default.
- W938128466 hasPublicationYear "2015" @default.
- W938128466 type Work @default.
- W938128466 sameAs 938128466 @default.
- W938128466 citedByCount "0" @default.
- W938128466 crossrefType "dissertation" @default.
- W938128466 hasAuthorship W938128466A5018249685 @default.
- W938128466 hasAuthorship W938128466A5044971917 @default.
- W938128466 hasConcept C110121322 @default.
- W938128466 hasConcept C119857082 @default.
- W938128466 hasConcept C124101348 @default.
- W938128466 hasConcept C124952713 @default.
- W938128466 hasConcept C126255220 @default.
- W938128466 hasConcept C126838900 @default.
- W938128466 hasConcept C134306372 @default.
- W938128466 hasConcept C14036430 @default.
- W938128466 hasConcept C142362112 @default.
- W938128466 hasConcept C154945302 @default.
- W938128466 hasConcept C177264268 @default.
- W938128466 hasConcept C183115368 @default.
- W938128466 hasConcept C185592680 @default.
- W938128466 hasConcept C198531522 @default.
- W938128466 hasConcept C199360897 @default.
- W938128466 hasConcept C2780980858 @default.
- W938128466 hasConcept C33923547 @default.
- W938128466 hasConcept C41008148 @default.