Matches in SemOpenAlex for { <https://semopenalex.org/work/W82277569> ?p ?o ?g. }
Showing items 1 to 78 of
78
with 100 items per page.
- W82277569 abstract "The use of diagnostic rules based on microarray gene expression data has received wide attention in bioinformatics research. In order to form diagnostic rules, statistical techniques are needed to form classifiers with estimates for their associated error rates, and to correct for any selection biases in the estimates. There are also the associated problems of identifying the genes most useful in making these predictions. Traditional statistical techniques require the number of samples to be much larger than the number of features. Gene expression datasets usually have a small number of samples, but a large number of features. In this thesis, some new techniques are developed, and traditional techniques are used innovatively after appropriate modification to analyse gene expression data. Classification: We first consider classifying tissue samples based on the gene expression data. We employ an external cross-validation with recursive feature elimination to provide classification error rates for tissue samples with different numbers of genes. The techniques are implemented as an R package BCC (Bias-Corrected Classification), and are applied to a number of real-world datasets. The results demonstrate that the error rates vary with different numbers of genes. For each dataset, there is usually an optimal number of genes that returns the lowest cross-validation error rate. Detecting Differentially Expressed Genes: We then consider the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. The focus is on the use of mixture models to handle the multiplicity issue. The mixture model approach provides a framework for the estimation of the prior probability that a gene is not differentially expressed. It estimates various error rates, including the FDR (False Discovery Rate) and the FNR (False Negative Rate). We also develop a method for selecting biomarker genes for classification, based on their repeatability among the highly differentially expressed genes in cross-validation trials. The latter method incorporates both gene selection and classification. Selection Bias: When forming a prediction rule on the basis of a small number of classified tissue samples, some form of feature (gene) selection is usually adopted. This is a necessary step if the number of features is high. As the subset of genes used in the final form of the rule has not been randomly selected but rather chosen according to some criteria designed to reflect the predictive power of the rule, there will be a selection bias inherent in estimates of the error rates of the rule if care is not taken. Various situations are presented where selection biases arise in the formation of a prediction rule and where there is a consequent need for the correction of the biases. Three types of selection biases are analysed: selection bias from not using external cross-validation, selection bias of not working with the full set of genes, and the selection bias from optimizing the classification error rate over a number of subsets obtained according to a selection method. Here we mostly employ the support vector machine with recursive feature elimination. This thesis includes a description of cross-validation schemes that are able to correct for these selection biases. Furthermore, we examine the bias incurred when using the predicted rather than the true outcomes to define the class labels in forming and evaluating the performance of the discriminant rule. Case Study: We present a case study using the breast cancer datasets. In the study, we compare the 70 highly differentially expressed genes proposed by van 't Veer and colleagues, against the set of the genes selected using our repeatability method. The results demonstrate that there is more than one set of biomarker genes. We also examine the selection biases that may exist when analysing this dataset. The selection biases are demonstrated to be substantial." @default.
- W82277569 created "2016-06-24" @default.
- W82277569 creator A5082932348 @default.
- W82277569 date "2009-04-01" @default.
- W82277569 modified "2023-09-23" @default.
- W82277569 title "Statistical Analysis of High-Dimensional Gene Expression Data" @default.
- W82277569 hasPublicationYear "2009" @default.
- W82277569 type Work @default.
- W82277569 sameAs 82277569 @default.
- W82277569 citedByCount "0" @default.
- W82277569 crossrefType "journal-article" @default.
- W82277569 hasAuthorship W82277569A5082932348 @default.
- W82277569 hasConcept C104317684 @default.
- W82277569 hasConcept C124101348 @default.
- W82277569 hasConcept C138885662 @default.
- W82277569 hasConcept C148483581 @default.
- W82277569 hasConcept C150194340 @default.
- W82277569 hasConcept C153180895 @default.
- W82277569 hasConcept C154945302 @default.
- W82277569 hasConcept C193244246 @default.
- W82277569 hasConcept C199360897 @default.
- W82277569 hasConcept C2776401178 @default.
- W82277569 hasConcept C40969351 @default.
- W82277569 hasConcept C41008148 @default.
- W82277569 hasConcept C41895202 @default.
- W82277569 hasConcept C54355233 @default.
- W82277569 hasConcept C70721500 @default.
- W82277569 hasConcept C81917197 @default.
- W82277569 hasConcept C8415881 @default.
- W82277569 hasConcept C86803240 @default.
- W82277569 hasConcept C90559484 @default.
- W82277569 hasConcept C95371953 @default.
- W82277569 hasConceptScore W82277569C104317684 @default.
- W82277569 hasConceptScore W82277569C124101348 @default.
- W82277569 hasConceptScore W82277569C138885662 @default.
- W82277569 hasConceptScore W82277569C148483581 @default.
- W82277569 hasConceptScore W82277569C150194340 @default.
- W82277569 hasConceptScore W82277569C153180895 @default.
- W82277569 hasConceptScore W82277569C154945302 @default.
- W82277569 hasConceptScore W82277569C193244246 @default.
- W82277569 hasConceptScore W82277569C199360897 @default.
- W82277569 hasConceptScore W82277569C2776401178 @default.
- W82277569 hasConceptScore W82277569C40969351 @default.
- W82277569 hasConceptScore W82277569C41008148 @default.
- W82277569 hasConceptScore W82277569C41895202 @default.
- W82277569 hasConceptScore W82277569C54355233 @default.
- W82277569 hasConceptScore W82277569C70721500 @default.
- W82277569 hasConceptScore W82277569C81917197 @default.
- W82277569 hasConceptScore W82277569C8415881 @default.
- W82277569 hasConceptScore W82277569C86803240 @default.
- W82277569 hasConceptScore W82277569C90559484 @default.
- W82277569 hasConceptScore W82277569C95371953 @default.
- W82277569 hasLocation W822775691 @default.
- W82277569 hasOpenAccess W82277569 @default.
- W82277569 hasPrimaryLocation W822775691 @default.
- W82277569 hasRelatedWork W110893188 @default.
- W82277569 hasRelatedWork W1564862406 @default.
- W82277569 hasRelatedWork W1586874143 @default.
- W82277569 hasRelatedWork W1965349260 @default.
- W82277569 hasRelatedWork W2000932646 @default.
- W82277569 hasRelatedWork W2007954962 @default.
- W82277569 hasRelatedWork W2017463373 @default.
- W82277569 hasRelatedWork W2070602947 @default.
- W82277569 hasRelatedWork W2073363112 @default.
- W82277569 hasRelatedWork W2103916454 @default.
- W82277569 hasRelatedWork W2107956883 @default.
- W82277569 hasRelatedWork W2117135012 @default.
- W82277569 hasRelatedWork W2214494803 @default.
- W82277569 hasRelatedWork W2737650280 @default.
- W82277569 hasRelatedWork W2946181041 @default.
- W82277569 hasRelatedWork W2950615701 @default.
- W82277569 hasRelatedWork W2951372107 @default.
- W82277569 hasRelatedWork W94883842 @default.
- W82277569 hasRelatedWork W2229878799 @default.
- W82277569 isParatext "false" @default.
- W82277569 isRetracted "false" @default.
- W82277569 magId "82277569" @default.
- W82277569 workType "article" @default.