Matches in SemOpenAlex for { <https://semopenalex.org/work/W3011055447> ?p ?o ?g. }
- W3011055447 abstract "Method18 March 2020Open Access Transparent process Improved detection of differentially represented DNA barcodes for high-throughput clonal phenomics Yevhen Akimov Yevhen Akimov orcid.org/0000-0003-0413-2564 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Search for more papers by this author Daria Bulanova Daria Bulanova Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Biotech Research and Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark Search for more papers by this author Sanna Timonen Sanna Timonen Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Search for more papers by this author Krister Wennerberg Krister Wennerberg orcid.org/0000-0002-1352-4220 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Biotech Research and Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark Search for more papers by this author Tero Aittokallio Corresponding Author Tero Aittokallio [email protected] orcid.org/0000-0002-0886-9769 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Department of Mathematics and Statistics, University of Turku, Turku, Finland Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway Search for more papers by this author Yevhen Akimov Yevhen Akimov orcid.org/0000-0003-0413-2564 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Search for more papers by this author Daria Bulanova Daria Bulanova Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Biotech Research and Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark Search for more papers by this author Sanna Timonen Sanna Timonen Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Search for more papers by this author Krister Wennerberg Krister Wennerberg orcid.org/0000-0002-1352-4220 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Biotech Research and Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark Search for more papers by this author Tero Aittokallio Corresponding Author Tero Aittokallio [email protected] orcid.org/0000-0002-0886-9769 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland Department of Mathematics and Statistics, University of Turku, Turku, Finland Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway Search for more papers by this author Author Information Yevhen Akimov1, Daria Bulanova1,2, Sanna Timonen1, Krister Wennerberg1,2 and Tero Aittokallio *,1,3,4,5 1Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland 2Biotech Research and Innovation Centre (BRIC) and Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), University of Copenhagen, Copenhagen, Denmark 3Department of Mathematics and Statistics, University of Turku, Turku, Finland 4Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway 5Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway *Corresponding author. Tel: +358 50 3182426; E-mail: [email protected] Molecular Systems Biology (2020)16:e9195https://doi.org/10.15252/msb.20199195 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Cellular DNA barcoding has become a popular approach to study heterogeneity of cell populations and to identify clones with differential response to cellular stimuli. However, there is a lack of reliable methods for statistical inference of differentially responding clones. Here, we used mixtures of DNA-barcoded cell pools to generate a realistic benchmark read count dataset for modelling a range of outcomes of clone-tracing experiments. By accounting for the statistical properties intrinsic to the DNA barcode read count data, we implemented an improved algorithm that results in a significantly lower false-positive rate, compared to current RNA-seq data analysis algorithms, especially when detecting differentially responding clones in experiments with strong selection pressure. Building on the reliable statistical methodology, we illustrate how multidimensional phenotypic profiling enables one to deconvolute phenotypically distinct clonal subpopulations within a cancer cell line. The mixture control dataset and our analysis results provide a foundation for benchmarking and improving algorithms for clone-tracing experiments. Synopsis The study present DEBRA (DESeq-based Barcode Representation Analysis), an algorithm that enables reliable detection of differentially represented clonal lineages from DNA barcoding experiments by accounting for the statistical properties intrinsic to the barcode read count data. Tagwise variance and other distributional characteristics of the barcode read count data in cellular DNA barcoding experiments are affected by sampling bottleneck, which is not taken into account by traditional analysis tools. DEBRA allows reliable detection of differentially represented clones under a wide range of bottleneck sizes and other experimental conditions. Multidimensional phenotypic profiling is introduced as a novel application of cellular DNA barcoding for inferring phenotypically distinct clonal subpopulations in heterogeneous samples. Introduction Cellular DNA barcoding was originally developed to trace clonal growth dynamics in vivo or in vitro (Gerrits et al, 2010; Nguyen et al, 2014, 2015; Porter et al, 2014; Simons, 2016). More recently, however, cellular DNA barcoding has been applied as an effective means to detect clone-specific differences in the phenotypes other than growth, including drug response (Bhang et al, 2015; Hata et al, 2016; Lan et al, 2017; preprint: Acar et al, 2019; Bell et al, 2019; Caiado et al, 2019; Echeverria et al, 2019; Merino et al, 2019; Seth et al, 2019), postsurgical recurrence (Roh et al, 2018), reprogramming capacity (Biddy et al, 2018; Shakiba et al, 2019), phenotypic plasticity (Lan et al, 2017; Mathis et al, 2017) and metastatic potential (Wagenblast et al, 2015; Echeverria et al, 2018; Merino et al, 2019). Generally, cellular DNA barcoding can be widely applied to quantify and trace in time clone-specific differences in virtually any phenotype for which a phenotype-based cell selection method exists. Unlike single-cell RNA transcriptomics-based reconstruction of cell lineage trees from the RNA expression profiles, the DNA barcoding-based clone tracing provides an unambiguous way to trace the identity of a particular clone over time and accurately quantify the changes in the clone sizes in response to a perturbation. Therefore, emerging methodologies seek to integrate DNA barcoding-based clone tracing with single-cell technologies, such as scRNA-seq (Biddy et al, 2018; Fletcher et al, 2018; Kester & van Oudenaarden, 2018; Raj et al, 2018; preprint: Weinreb et al, 2018), or even isolate clones carrying a barcode of interest for in-depth cellular profiling (Al'Khafaji et al, 2018; preprint: Rebbeck et al, 2018; preprint: Akimov et al, 2019). These developments are expected to provide even more high-resolution insights into the biology of heterogeneous cellular systems. However, to our knowledge, there have been no systematic efforts to benchmark the accuracy of clonal phenotype quantification via DNA barcoding. In a typical clone-tracing experiment (Fig 1A), cells are infected with virus particles carrying a short semi-random DNA sequence—a “barcode”. The infection is performed in a very low multiplicity of infection (MOI) to ensure that each cell receives only one barcode. After that, the cells are expanded to achieve a sufficient representation of individual clones and divided into samples, typically “control” and “treatment” pools, where the control pool determines a background barcode representation, whereas the treatment pool(s) are subjected to a phenotype-based selection (e.g. drug treatment, immunophenotyping or xenografting). Finally, the barcodes are PCR-amplified from genomic DNA, and the barcode frequencies are estimated within each pool with next-generation sequencing (NGS). In the quantification phase, clone sizes are assumed to be proportional to the barcode abundances, and accordingly, differentially represented barcodes (DRBs) between the treatment pool(s) and control population indicate clone-specific differences in the particular phenotype. Figure 1. An overview of the experimental setup for the benchmark dataset generation A schematic presentation of a typical clone-tracing experiment (see text for description). To generate the benchmark barcode count datasets, we performed two independent high-complexity DNA barcoding experiments on Mia-PaCa-II and OVCAR5 cell lines (see Materials and Methods for details). In each experiment, cells were collected after selection and expansion step (Fig 1A) to produce two cell pools (Pool A and Pool B). Cells in each pool were counted and mixed in a 50/50 ratio to produce “AB mix”. The AB mix was then sampled in various extents in two replicas to produce so-called null samples with different numbers of cells (20 × 103, 40 × 103, 80 × 103, 160 × 103, 330 × 103, 660 × 103), but with the same expected representation of each barcode. Perturbed samples were generated by taking either 20, 40, 80 or 160 thousand cells from the AB mix, and adding an indicated percentages of cells from the Pool A (e.g. for sample with 160 × 103 cells and perturbation degree of 35%, we added 160 × 103 × 0.35 = 56 × 103 cells from the Pool A). The number of replicas for each sample is indicated in circles next to the tube icon. Barcode representation fold changes (log2) in the null samples of the indicated sizes (number of cells sampled from the AB mix), relative to the mean of two Null-660 replicas. Barcodes are ordered according to their size in the Null-660 samples. Pool A barcodes are sorted in decreasing order, and Pool B barcodes are ordered in increasing order. Boxes represent interquartile ranges for each group of 53 observations. Whiskers indicate upper and lower quartiles. Central line corresponds to the median value. Same data as in (C) but for the perturbed samples. Dotted lines indicate the expected barcode fold changes calculated using formula: (cells from pool A/total number of cells)/0.5, for the Pool A barcodes, and similarly for the Pool B barcodes. Data representation is the same as in (C). Download figure Download PowerPoint In statistical terms, the detection of DRBs can be considered as identification of differentially represented sequencing tags from high-throughput count data, and RNA-seq data analysis algorithms have been applied to this task (Seth et al, 2019). However, we hypothesized that barcode count data from clone-tracing experiments may seriously violate the basic assumptions of the RNA-seq analysis algorithms (i.e. that tagwise variance is homogeneous and the read counts follow a negative binomial distribution). We reasoned that the tagwise variance and the underlying distribution of the barcode read counts may depend on the extent of the sampling bottleneck introduced by the experimental manipulations (e.g. treatment). Such sample size reduction can be extremely high in some applications (e.g. high doses of a drug, cell sorting for rare subpopulations or xenotransplantation), leading to a narrow sampling bottleneck. Therefore, differences in the selection pressure (and hence sampling size) may result in a biased performance of DRB detection with the current RNA-seq analysis algorithms, unless corrected for. Here, we performed multiple clone-tracing experiments on cancer cell lines to generate barcoded cell pools with non-overlapping sets of barcodes. We used these cell pools to produce benchmarking barcode read count datasets that model various outcomes of clone-tracing experiments. Our design simulates varying degrees of sampling-induced biases and clone-specific responses, with known ground truth to allow for benchmarking of DRB detection algorithms. We compared the commonly used RNA-seq analysis algorithms, DESeq (Anders & Huber, 2010), DESeq2 (Love et al, 2014) and edgeR (Robinson et al, 2010; McCarthy et al, 2012). Based on the benchmarking results, we developed DEBRA (DESeq-based Barcode Representation Analysis) algorithm for more reliable clone tracing through improved DRB detection accuracy and a proper control for false discoveries in a wide range of experimental conditions. Finally, we demonstrate how multidimensional phenotypic profiling can be implemented on barcoded cancer cells to identify phenotypically distinct clonal subpopulations. Results A benchmark dataset for modelling response heterogeneity in clone-tracing experiments To systematically study the effect of sampling on DNA barcode count data, and the applicability of the RNA-seq data analysis algorithms to the identification of differentially responding clonal lineages, we generated a benchmark barcode read count datasets with known ground truth for differential barcode representation and realistic barcode frequency distribution. Specifically, we performed high-complexity cellular DNA barcoding experiments on two cancer cell lines—OVCAR5 and Mia-PaCa-2 (see Materials and Methods). Each cell line was independently transduced in two replicas, selected with antibiotic and expanded to produce two cell pools with non-overlapping sets of DNA barcodes (Pool A and Pool B, see Fig 1). For each cell line, the produced cell pools were mixed in a 50/50 ratio to generate the AB mix (Fig 1B), from where 18 samples of different sizes were sampled (null samples; Fig 1B). This experimental design models an experimental scenario in which different degrees of selection pressure (and hence bottleneck sizes) are applied to a sample with no clone-specific differences (Fig 1C), in response to the selection pressure (e.g. treatment). We called these samples null samples because no barcode is expected to be differentially represented, and therefore, an accurate DRB detection algorithm is supposed to accept the null hypothesis for all the barcodes. Such null samples enabled us to study the effect of sampling size on the statistical characteristics of barcode count data and to estimate the false discovery rate of DRB detection algorithms. Furthermore, we generated 24 perturbed samples, where the representation of a set of barcodes in the AB mix mixture was changed by adding extra number of cells from the barcoded cell Pool A (Fig 1B). Perturbed samples model the outcome of clone-tracing experiments on a cellular population with varying degrees of clone-specific responses to the selection pressure (e.g. treatment; Fig 1D). By sequencing of the Pool A and Pool B, we determined the ground truth for differential representation of the barcodes in the perturbed samples, which allowed us to benchmark the accuracy of the DRB detection algorithms. Sampling bottleneck affects statistical properties of DNA barcode count data and DRB detection accuracy To investigate the statistical characteristics of the benchmark barcode count data, we first analysed the mean–variance relationships for each pair of null samples. We found a marked increase in the variance as the size of the sample decreases in both OVCAR5 and Mia-PaCa-2 cells (Figs 2A and B, and EV1A and B). We observed a similar dependency in the data from a pancreatic cancer patient-derived xenograft (PDX), published by Seth (Seth et al, 2019; Fig 2A and B), where the variance of the drug-treated samples is much higher as compared to that of the non-treated controls. The observed difference is likely due to the decrease in the total number of cells (sample size) in response to the drug treatment. We next tested how well the barcode count data follows a negative binomial (NB) distribution using the goodness-of-fit estimation for our OVCAR5 and Mia-PaCa-2 null samples and the published pancreatic PDX samples (Seth et al, 2019). Notably, the NB model approximated poorly the barcode count data at low count region both in the small-sized OVCAR5 null samples and in the PDX drug-treated samples (Figs 2C and EV2C). These properties of the barcode count data violate the basic assumptions made in the RNA-seq data analysis algorithms, which may lead to sub-optimal performance when applied to DRB detection in clone-tracing experiments. Figure 2. Sampling size affects the statistical properties and accuracy of DRB detection Mean–variance plots for the benchmark OVCAR5 null samples and pancreatic cancer patient-derived xenograft (PDX) samples (Seth et al, 2019). Local variance was calculated by averaging a tagwise variance over the mean counts using a 20 read-count window. Scatterplots of median-normalized read counts (log10) of OVCAR5 null samples and pancreatic PDX samples (Seth et al, 2019). Local goodness-of-fit testing for negative binomial distribution where the distribution parameters were estimated using maximum-likelihood estimator (MLE). Two-sample Cramer–von Mises test was used to compare the observed and simulated negative binomial random variables. Statistical significance was determined using Monte Carlo bootstrap method, where a small empirical P-value indicates strong deviation from the negative binomial distribution. The proportion of differentially represented barcodes (DRBs) identified in the OVCAR5 null samples with various versions of RNA-seq analysis algorithms. Two replicas of the null samples of indicated sizes (x-axis) were tested for DRBs against a control group of 4 null samples (two Null-660 samples and two Null-330 samples). The bars represent the average proportion of DRBs identified with the algorithms, calculated over threefold bootstrap runs (mean of the 10 resamples with replacement) under the indicted false discovery rates (FDRs). The version with unadjusted P-values is shown in Appendix Fig S1A for comparison. Error bars, SD; LRT, likelihood ratio test; Wald, Wald test; QLF, quasi-likelihood F-test; and exact, implementation of exact test proposed by Robinson and Smyth (Robinson & Smyth, 2008), as implemented in the original algorithms. Download figure Download PowerPoint Click here to expand this figure. Figure EV1. Clone size characteristics of the benchmark datasets Cumulative distributions of clone sizes in OVCAR-5 null-660 sample (left) and Mia-Paca-2 null-40 sample (right). Barcode representation fold changes (log2) for the null samples of the indicated sizes (number of cells subsampled from the AB mix) relative to the mean of two Null-660 replicas. Barcodes are ordered according to the size in the Null-660 subsamples. Pool A barcodes are sorted in the descending order, and Pool B barcodes are ordered in the ascending order. Boxes represent interquartile ranges (25 to 75 percentile) for each group of 53 observations. Whiskers indicate upper and lower quartiles. Central line corresponds to the median value. Same as Fig EV1B but for the perturbed subsamples. Dotted lines indicates the expected barcode fold changes calculated using formula: (cells from pool A/total number of cells)/0.5, for the Pool A barcodes, and formula: (cells from pool B/total number of cells)/0.5, for the Pool B barcodes. Data representation is the same as in (B). Download figure Download PowerPoint Click here to expand this figure. Figure EV2. Sampling size affects statistical properties and accuracy of DRB calling. Mean–variance plots for the benchmark OVCAR5 null subsamples (replica#2) and perturbed subsamples (35% perturbation degree; replicas #1 and #2). Local variance was calculated by averaging a tagwise variance over the mean counts using a 20 read-count window. Mean counts were estimated using all the null or perturbed samples, respectively. Mean–variance plots for Mia-PaCa-2 null subsamples. Barcode read counts were median-normalized. Local variance was calculated by averaging a tagwise variance over the mean counts using a 20 read-count window. Scatter plots of median-normalized read counts of Mia-PaCa-2 null subsamples. Local negative binomial goodness of fit was estimated using chi-squared test or Cramer–von Mises test. Dispersion parameter of the negative binomial model was estimated locally over the window of 3 read counts using maximum-likelihood estimator. P-value of the chi-squared test statistics was estimated using fitdistrplus::gofstat() function. P-values of the Cramer–von Mises test were calculated by Monte Carlo bootstrap method as implemented in RVAideMemoire::cramer.test. Download figure Download PowerPoint To test the performance of the RNA-seq analysis algorithms for the identification of DRBs, we applied the widely used algorithms—DESeq, DESeq2 and edgeR—on the OVCAR5 null samples. An accurate DRB detection method is expected to accept the null hypothesis for all the barcodes (i.e. no barcode should be identified as differentially represented), since the representation of the barcodes is equal across the null samples. However, all the tested versions of the algorithms identified a significant number of DRBs between the null samples of different sizes, with percentages of DRBs reaching 50% at smaller sample sizes and higher FDR levels (Fig 2D). We note that all these detections are false positives, and all the algorithms had much higher type I error rates than those expected based on their empirical P-values (Appendix Fig S1A). DESeq performed better than the other algorithms, yet it identified more than 15% false positives at sample size of 20 × 103 cells with a nominal FDR level of 0.25. Moreover, the performance of DESeq decreased when implemented in other designs (Appendix Fig S1B). With all the tested algorithms, the proportion of falsely detected DRBs increased when comparing null samples with larger differences in size and hence variance. These analysis results show that the decrease in sample size due to the selection pressure or any other manipulation leading to cell loss may severely compromise the accuracy of DRB detection with the standard RNA-seq analysis algorithms. Modified versions of DESeq and DESeq2 algorithms effectively control for false discoveries Mean–variance modelling is central for the inference of differentially represented tags by the RNA-seq analysis algorithms. In DESeq and DESeq2 algorithms, the tagwise variances are estimated by fitting a negative binomial (NB) generalized linear model, which assumes variance homogeneity across sample groups. However, when the variances are not homogeneous, which is the case for the DNA barcoding data (Fig 2A), the resulting tagwise estimates will become close to the average variance between the control and treatment samples. In this case, subsequent statistical test will be performed against the NB model with dispersion parameter different from that of the treatment sample, hence compromising the accuracy of the DRB detection. Therefore, we reasoned that the observed high rates of false discoveries by standard RNA-seq analysis algorithms (Fig 2D) are caused by the differences between the variances of the control and test samples (Fig 2A). This notion is supported by the fact that the rate of false discoveries was dependent on the sample size and hence variance difference between control and treatment samples (Fig 2D). Another possible source of false discoveries is the deviance from the NB model in the low count regions (Figs 2C and EV2C), which renders the statistical tests assuming a NB model non-applicable for non-NB barcodes. To address these statistical issues, we implemented two modifications to the DESeq2 and DESeq algorithms: We modified the variance estimation procedure so that the estimation of the tagwise variances is performed exclusively from the replicates of test samples (e.g. treated samples). Two different options for the variance estimation were investigated (see below and Materials and Methods for details). We implemented a heuristic algorithm that estimates a group-specific read count level (so-called β threshold, see Materials and Methods), above which the read counts follow the NB model. The estimated β threshold is used as a lower bound for the independent filtering step (Bourgon et al, 2010; Love et al, 2014). For the variance estimation, we adopted two widely used options, which we refer to as “trended” and “shrunk” methods. The trended method corresponds to the classical approach for mean–variance relationship modelling that estimates tagwise variances from local mean–variance model as fitted by DESeq2 algorithm (via locfit R package; Loader, 2013). The shrunk method corresponds to the default method for dispersion estimation as implemented in DESeq2, where the tagwise variances are calculated via Bayesian shrinkage of individual estimates towards mean–variance trend (Love et al, 2014). The proposed β thresholding approach aims to prevent possible false discoveries arising from the read counts which do not follow the NB model, while taking advantage of the improved detection power provided by the independent filtering algorithm (Bourgon et al, 2010; Love et al, 2014; see Materials and Methods for details). We implemented the modified DESeq and DESeq2 algorithms into a method, dubbed DEBRA (DESeq-based Barcode Representation Analysis), which is available through the Github portal (https://github.com/YevhenAkimov/DEBRA). To benchmark the modified algorithms, we first applied DEBRA to the OVCAR5 null samples. The modified methods correctly accepted the null hypothesis for virtually all the barcodes when the null samples were tested against each other (Fig 3A), hence demonstrating a greatly improved control for false discoveries compared to the original algorithms. When the trended dispersion estimates were used, the proportion of identified DRBs were within the range of 0–1.5 × 10−3, while the shrunken estimates led to somewhat increased false-positive DRB rate of up to 4 × 10−3. To evaluate the relative contributions of the two modifications implemented in DEBRA to the false discovery rate, we first run the DEBRA algorithm without the β thresholding step. We observed a drastic drop in the number of false discoveries (Fig EV3), suggesting that incorrect dispersion estimation is responsible for most of the false discoveries. The remaining false discoveries were in the low read count region and were therefore eliminated when the β thresholding was applied (Fig EV3). Figure 3. Comparison of the algorithms’ performanceCircles left to the algorithms’ names indicate the modified algorithms. Barplots of the percentage of DRBs identified by the modified algorithms in the OVCAR5 null samples, calculated over threefold bootstrap runs (10 resamples with replacement) using the same design as in Fig 2D. Error bars, SD. The performance of the original and modified algorithms for detection of enriched barcodes in the perturbed samples. Two replicas of the sample with perturbation degree of 35%, indicated size (top) and enriched proportions (right), were tested against four null samples (two replicas of Null-660 samples and two replicas of Null-330 samples). The bars represent the average percentage of the barcodes detected as enriched DRBs (fold change > 0; FDR < 0.25) by the indicated algorithm, calculated over threefold bootstrap runs (10 resamples with replacement). Correctly assigned barcodes (classified according to the ground truth) are marked in blue and incorrectly detected barcodes (not classified according to the ground truth) are marked in red (see Fig EV3 for the results in the samples with other perturbation degrees and proportions of enriched barcodes). White circles mark the percentage of barcodes corresponding to the nominal FDR level. Error bars, SD. The standardized partial area under the precision-recall curve (pAUC), calculated using intervals of [0,1] for precision and [0,X] for recall, where X is the mean recall value at FDR = 0.25 for a given sample over all the tested algorithms. The panel shows the pAUC for perturbed samples of indicted size and perturbation degree with enriched barcode proportion of 0.5 (see Appendix Figs S5 and S6 for pAUCs and precision-recall curves for other sample sizes, perturbation degrees and proportions of enriched barcodes). For calculating the precision and recall metrics, we ranked the barcodes according to their unadjusted P-values as classification scores, where the positive class was defined as correctly detected barcodes (correctly assigned to either enriched or depleted group; see Materials and Methods for details). A total of 10 threefold bootstrap runs with replacement were performed. Boxes rep" @default.
- W3011055447 created "2020-03-23" @default.
- W3011055447 creator A5016681267 @default.
- W3011055447 creator A5025121259 @default.
- W3011055447 creator A5026664404 @default.
- W3011055447 creator A5064674972 @default.
- W3011055447 creator A5067760174 @default.
- W3011055447 date "2020-03-01" @default.
- W3011055447 modified "2023-10-17" @default.
- W3011055447 title "Improved detection of differentially represented DNA barcodes for high‐throughput clonal phenomics" @default.
- W3011055447 cites W1492083415 @default.
- W3011055447 cites W1979486580 @default.
- W3011055447 cites W1987191607 @default.
- W3011055447 cites W2016667239 @default.
- W3011055447 cites W2032518018 @default.
- W3011055447 cites W2037303548 @default.
- W3011055447 cites W2048060724 @default.
- W3011055447 cites W2058358568 @default.
- W3011055447 cites W2058977958 @default.
- W3011055447 cites W2100122648 @default.
- W3011055447 cites W2112239573 @default.
- W3011055447 cites W2114104545 @default.
- W3011055447 cites W2115235864 @default.
- W3011055447 cites W2119204091 @default.
- W3011055447 cites W2129375080 @default.
- W3011055447 cites W2130116522 @default.
- W3011055447 cites W2141425631 @default.
- W3011055447 cites W2150479858 @default.
- W3011055447 cites W2152239989 @default.
- W3011055447 cites W2168127427 @default.
- W3011055447 cites W2179438025 @default.
- W3011055447 cites W2181542557 @default.
- W3011055447 cites W2231820207 @default.
- W3011055447 cites W2272802268 @default.
- W3011055447 cites W2276370059 @default.
- W3011055447 cites W2298134893 @default.
- W3011055447 cites W2338215681 @default.
- W3011055447 cites W2510031061 @default.
- W3011055447 cites W2551865970 @default.
- W3011055447 cites W2560648000 @default.
- W3011055447 cites W2587997077 @default.
- W3011055447 cites W2626695356 @default.
- W3011055447 cites W2752214746 @default.
- W3011055447 cites W2769137962 @default.
- W3011055447 cites W2781417302 @default.
- W3011055447 cites W2788809291 @default.
- W3011055447 cites W2795331645 @default.
- W3011055447 cites W2800852522 @default.
- W3011055447 cites W2801820889 @default.
- W3011055447 cites W2886485146 @default.
- W3011055447 cites W2889326414 @default.
- W3011055447 cites W2901738572 @default.
- W3011055447 cites W2901757858 @default.
- W3011055447 cites W2902451951 @default.
- W3011055447 cites W2907308560 @default.
- W3011055447 cites W2912649747 @default.
- W3011055447 cites W2914661118 @default.
- W3011055447 cites W2914759417 @default.
- W3011055447 cites W2924170852 @default.
- W3011055447 cites W2936989468 @default.
- W3011055447 cites W2950859937 @default.
- W3011055447 cites W2952874019 @default.
- W3011055447 cites W2985607483 @default.
- W3011055447 doi "https://doi.org/10.15252/msb.20199195" @default.
- W3011055447 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/7080434" @default.
- W3011055447 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/32187448" @default.
- W3011055447 hasPublicationYear "2020" @default.
- W3011055447 type Work @default.
- W3011055447 sameAs 3011055447 @default.
- W3011055447 citedByCount "11" @default.
- W3011055447 countsByYear W30110554472020 @default.
- W3011055447 countsByYear W30110554472021 @default.
- W3011055447 countsByYear W30110554472022 @default.
- W3011055447 countsByYear W30110554472023 @default.
- W3011055447 crossrefType "journal-article" @default.
- W3011055447 hasAuthorship W3011055447A5016681267 @default.
- W3011055447 hasAuthorship W3011055447A5025121259 @default.
- W3011055447 hasAuthorship W3011055447A5026664404 @default.
- W3011055447 hasAuthorship W3011055447A5064674972 @default.
- W3011055447 hasAuthorship W3011055447A5067760174 @default.
- W3011055447 hasBestOaLocation W30110554471 @default.
- W3011055447 hasConcept C104317684 @default.
- W3011055447 hasConcept C141231307 @default.
- W3011055447 hasConcept C157764524 @default.
- W3011055447 hasConcept C189206191 @default.
- W3011055447 hasConcept C41008148 @default.
- W3011055447 hasConcept C54355233 @default.
- W3011055447 hasConcept C552990157 @default.
- W3011055447 hasConcept C555944384 @default.
- W3011055447 hasConcept C70721500 @default.
- W3011055447 hasConcept C76155785 @default.
- W3011055447 hasConcept C86803240 @default.
- W3011055447 hasConcept C98108635 @default.
- W3011055447 hasConceptScore W3011055447C104317684 @default.
- W3011055447 hasConceptScore W3011055447C141231307 @default.
- W3011055447 hasConceptScore W3011055447C157764524 @default.
- W3011055447 hasConceptScore W3011055447C189206191 @default.
- W3011055447 hasConceptScore W3011055447C41008148 @default.
- W3011055447 hasConceptScore W3011055447C54355233 @default.
- W3011055447 hasConceptScore W3011055447C552990157 @default.