Matches in SemOpenAlex for { <https://semopenalex.org/work/W64389275> ?p ?o ?g. }
- W64389275 abstract "It is often convenient to make certain assumptions during the learning process. Unfortunately, algorithms built on these assumptions can often break down if the assumptions are not stable between train and test data. Relatedly, we can do better at various tasks (like named entity recognition) by exploiting the richer relationships found in real-world complex systems. By exploiting these kinds of non-conventional regularities we can more easily address problems previously unapproachable, like transfer learning. In the transfer learning setting, the distribution of data is allowed to vary between the training and test domains, that is, the independent and identically distributed (i.i.d.) assumption linking train and test examples is severed. Without this link between the train and test data, traditional learning is difficult. In this thesis we explore learning techniques that can still succeed even in situations where i.i.d. and other common assumptions are allowed to fail. Specifically, we seek out and exploit regularities in the problems we encounter and document which specific assumptions we can drop and under what circumstances and still be able to complete our learning task. We further investigate different methods for dropping, or relaxing, some of these restrictive assumptions so that we may bring more resources (from unlabeled auxiliary data, to known dependencies and other regularities) to bear on the problem, thus producing both better answers to existing problems, and even being able to begin addressing problems previously unanswerable, such as those in the transfer learning setting. In particular, we introduce four techniques for producing robust named entity recognizers, and demonstrate their performance on the problem domain of protein name extraction in biological publications: (1) Feature hierarchies relate distinct, though related, features to one another via a natural linguistically-inspired hierarchy. (2) Structural frequency features exploit a regularity based on the structure of the data itself and the distribution of instances across that structure. (3) Snippets link data not by the distribution of the instances or their features, but by their labels. Thus data that have different attributes, but similar labels, will be joined together, while instances that have similar features, but distinct labels, are segregated to allow for variation between domains. (4) Graph relations represent the entities contained in the data and their relationships to each other as a network which is exploited to help discover robust regularities across domains. Thus we show that learned classifiers and extractors can be made more robust to shifts between the train and test data by using data (both labeled and unlabeled) from related domains and tasks, and by exploiting stable regularities and complex relationships between different aspects of that data." @default.
- W64389275 created "2016-06-24" @default.
- W64389275 creator A5010148946 @default.
- W64389275 creator A5051617344 @default.
- W64389275 date "2009-01-01" @default.
- W64389275 modified "2023-09-26" @default.
- W64389275 title "Exploiting domain and task regularities for robust named entity recognition" @default.
- W64389275 cites W1482328859 @default.
- W64389275 cites W1510579866 @default.
- W64389275 cites W1532470117 @default.
- W64389275 cites W1534979469 @default.
- W64389275 cites W1540550673 @default.
- W64389275 cites W1550206324 @default.
- W64389275 cites W1568620938 @default.
- W64389275 cites W1571969069 @default.
- W64389275 cites W1592796124 @default.
- W64389275 cites W1597032530 @default.
- W64389275 cites W1630959083 @default.
- W64389275 cites W1760863262 @default.
- W64389275 cites W1866403196 @default.
- W64389275 cites W1966553486 @default.
- W64389275 cites W1968004512 @default.
- W64389275 cites W1979495886 @default.
- W64389275 cites W1982626598 @default.
- W64389275 cites W1994913251 @default.
- W64389275 cites W1996739860 @default.
- W64389275 cites W1998839399 @default.
- W64389275 cites W2022775778 @default.
- W64389275 cites W202303397 @default.
- W64389275 cites W204184694 @default.
- W64389275 cites W2048679005 @default.
- W64389275 cites W2049633694 @default.
- W64389275 cites W2056354103 @default.
- W64389275 cites W2058856481 @default.
- W64389275 cites W2063978378 @default.
- W64389275 cites W2064153289 @default.
- W64389275 cites W2068210077 @default.
- W64389275 cites W2079724595 @default.
- W64389275 cites W2091568528 @default.
- W64389275 cites W2096175520 @default.
- W64389275 cites W2096259344 @default.
- W64389275 cites W2097089247 @default.
- W64389275 cites W2101210369 @default.
- W64389275 cites W2101599977 @default.
- W64389275 cites W2102419107 @default.
- W64389275 cites W2103017472 @default.
- W64389275 cites W2104936489 @default.
- W64389275 cites W2107008379 @default.
- W64389275 cites W2107747058 @default.
- W64389275 cites W2108346334 @default.
- W64389275 cites W2109705661 @default.
- W64389275 cites W2111557120 @default.
- W64389275 cites W2113920009 @default.
- W64389275 cites W2114296159 @default.
- W64389275 cites W2118029084 @default.
- W64389275 cites W2120354757 @default.
- W64389275 cites W2120708938 @default.
- W64389275 cites W2129620481 @default.
- W64389275 cites W2130903752 @default.
- W64389275 cites W2131953535 @default.
- W64389275 cites W2133013156 @default.
- W64389275 cites W2133532371 @default.
- W64389275 cites W2134125014 @default.
- W64389275 cites W2135046866 @default.
- W64389275 cites W2135352205 @default.
- W64389275 cites W2136504847 @default.
- W64389275 cites W2136979193 @default.
- W64389275 cites W2138621811 @default.
- W64389275 cites W2143419558 @default.
- W64389275 cites W2145677303 @default.
- W64389275 cites W2147880316 @default.
- W64389275 cites W2148029428 @default.
- W64389275 cites W2148603752 @default.
- W64389275 cites W2152455533 @default.
- W64389275 cites W2157110051 @default.
- W64389275 cites W2158108973 @default.
- W64389275 cites W2159882563 @default.
- W64389275 cites W2162461580 @default.
- W64389275 cites W2163828439 @default.
- W64389275 cites W2166096645 @default.
- W64389275 cites W2166474856 @default.
- W64389275 cites W2167044614 @default.
- W64389275 cites W2168905447 @default.
- W64389275 cites W2169591950 @default.
- W64389275 cites W2293363371 @default.
- W64389275 cites W2293576742 @default.
- W64389275 cites W2420733993 @default.
- W64389275 cites W2785349534 @default.
- W64389275 cites W2913340405 @default.
- W64389275 cites W2964041960 @default.
- W64389275 cites W3099514962 @default.
- W64389275 cites W317108637 @default.
- W64389275 cites W60337842 @default.
- W64389275 cites W71647219 @default.
- W64389275 cites W3151309449 @default.
- W64389275 hasPublicationYear "2009" @default.
- W64389275 type Work @default.
- W64389275 sameAs 64389275 @default.
- W64389275 citedByCount "4" @default.
- W64389275 countsByYear W643892752012 @default.