Matches in SemOpenAlex for { <https://semopenalex.org/work/W2299503175> ?p ?o ?g. }
- W2299503175 abstract "Building predictive models in computational biology involves three key elements: choosing an appropriate scoring model for the task at hand; developing efficient inference algorithms for making predictions; and optimizing scoring parameters so that, the generated predictions are biologically meaningful. In many sub areas of computational biology, research in predictive methods for biology has focused on the first, two steps, whereas methods used for scoring parameter estimation often rely on a scattered combination of techniques, ranging from ad hoc statistical analysis and physicochemical arguments to manual trial-and-error. In this thesis, we consider the problem of scoring parameter estimation for three key problems in computational biology: protein sequence alignment, RNA secondary structure prediction, and RNA simultaneous folding and alignment. We formulate the model estimation task as a special class of supervised machine learning problems where the goal is to learn a mapping from a structured input space (e.g., amino acid or RNA sequences) to a structured output space (e.g., alignments or foldings). Under this framework, the problem of model estimation reduces to solving a convex optimization problem. Following this setup, we design structured probabilistic or max-margin models for each task. To allow our algorithms to scale efficiently to large-scale training sets, we develop new fast online and batch convex optimization algorithms specially tailored for learning structured models. We also develop an automated approach for designing custom regularization penalties to prevent overfitting in feature-rich scoring models. The resulting software packages for alignment (CONTRAlign), secondary structure prediction (CONTRAfold), and simultaneous alignment and folding (RAF) each obtain state-of-the-art accuracy in their respective domains. In particular, our alignment algorithm, CONTRAlign, obtains substantially improved sensitivity for the difficult class of twilight zone alignments. Our RNA secondary structure prediction algorithm, CONTRAfold, achieves higher general accuracy than existing classical methods, demonstrating for the first time that a statistically estimated scoring model can outperform thermodynamic approaches. Finally, our RNA simultaneous folding and alignment program, RAF, achieves high accuracies while also taking advantage of new sparsity heuristics to achieve running times orders of magnitude faster than previous approaches." @default.
- W2299503175 created "2016-06-24" @default.
- W2299503175 creator A5004941913 @default.
- W2299503175 creator A5023259221 @default.
- W2299503175 date "2009-01-01" @default.
- W2299503175 modified "2023-09-27" @default.
- W2299503175 title "Discriminative structured models for biological sequence analysis" @default.
- W2299503175 cites W1489075871 @default.
- W2299503175 cites W1491374429 @default.
- W2299503175 cites W1500214331 @default.
- W2299503175 cites W1508225465 @default.
- W2299503175 cites W1522740115 @default.
- W2299503175 cites W1531414226 @default.
- W2299503175 cites W1543796220 @default.
- W2299503175 cites W1567512734 @default.
- W2299503175 cites W1567621547 @default.
- W2299503175 cites W1583497301 @default.
- W2299503175 cites W1585306636 @default.
- W2299503175 cites W1585529040 @default.
- W2299503175 cites W1647671624 @default.
- W2299503175 cites W1802388531 @default.
- W2299503175 cites W1843195819 @default.
- W2299503175 cites W1846926012 @default.
- W2299503175 cites W1866987985 @default.
- W2299503175 cites W1909512798 @default.
- W2299503175 cites W1934021597 @default.
- W2299503175 cites W1969153299 @default.
- W2299503175 cites W1969551552 @default.
- W2299503175 cites W1973578915 @default.
- W2299503175 cites W1979147581 @default.
- W2299503175 cites W1983534198 @default.
- W2299503175 cites W1984675364 @default.
- W2299503175 cites W1992361112 @default.
- W2299503175 cites W1998585544 @default.
- W2299503175 cites W2001573278 @default.
- W2299503175 cites W2003787247 @default.
- W2299503175 cites W2004178884 @default.
- W2299503175 cites W2004228538 @default.
- W2299503175 cites W2004575181 @default.
- W2299503175 cites W2004915807 @default.
- W2299503175 cites W2005688170 @default.
- W2299503175 cites W2006290495 @default.
- W2299503175 cites W2006903949 @default.
- W2299503175 cites W2008652694 @default.
- W2299503175 cites W2009570821 @default.
- W2299503175 cites W2009736453 @default.
- W2299503175 cites W2009818318 @default.
- W2299503175 cites W2011368877 @default.
- W2299503175 cites W2013465290 @default.
- W2299503175 cites W2014553879 @default.
- W2299503175 cites W2017302704 @default.
- W2299503175 cites W2018255407 @default.
- W2299503175 cites W2026881360 @default.
- W2299503175 cites W2035042171 @default.
- W2299503175 cites W2035720976 @default.
- W2299503175 cites W2041928877 @default.
- W2299503175 cites W2042881722 @default.
- W2299503175 cites W2048814860 @default.
- W2299503175 cites W2062018285 @default.
- W2299503175 cites W2062233253 @default.
- W2299503175 cites W2074231493 @default.
- W2299503175 cites W2081778904 @default.
- W2299503175 cites W2082087000 @default.
- W2299503175 cites W2082988498 @default.
- W2299503175 cites W2085115334 @default.
- W2299503175 cites W2085277871 @default.
- W2299503175 cites W2085684972 @default.
- W2299503175 cites W2086240273 @default.
- W2299503175 cites W2091371007 @default.
- W2299503175 cites W2092423930 @default.
- W2299503175 cites W2094386736 @default.
- W2299503175 cites W2095145214 @default.
- W2299503175 cites W2097826433 @default.
- W2299503175 cites W2099383450 @default.
- W2299503175 cites W2101220662 @default.
- W2299503175 cites W2102046276 @default.
- W2299503175 cites W2102502076 @default.
- W2299503175 cites W2103899452 @default.
- W2299503175 cites W2104216515 @default.
- W2299503175 cites W2105636360 @default.
- W2299503175 cites W2105644991 @default.
- W2299503175 cites W2105801262 @default.
- W2299503175 cites W2106129230 @default.
- W2299503175 cites W2106171469 @default.
- W2299503175 cites W2106293441 @default.
- W2299503175 cites W2106882534 @default.
- W2299503175 cites W2108642468 @default.
- W2299503175 cites W2111652614 @default.
- W2299503175 cites W2111773652 @default.
- W2299503175 cites W2111937814 @default.
- W2299503175 cites W2114143204 @default.
- W2299503175 cites W2114520383 @default.
- W2299503175 cites W2122035121 @default.
- W2299503175 cites W2123241418 @default.
- W2299503175 cites W2124162379 @default.
- W2299503175 cites W2125838338 @default.
- W2299503175 cites W2127122014 @default.
- W2299503175 cites W2127774996 @default.
- W2299503175 cites W2129160848 @default.
- W2299503175 cites W2131348505 @default.