Matches in SemOpenAlex for { <https://semopenalex.org/work/W2952743539> ?p ?o ?g. }
- W2952743539 abstract "A critical goal in biology is to relate the phenotype to the genotype, that is, to find the genetic determinants of various traits. However, while simple monofactorial determinants are relatively easy to identify, the underpinnings of complex phenotypes are harder to predict. While traditional approaches rely on genome-wide association studies based on Single Nucleotide Polymorphism data, the ability of machine learning algorithms to find these determinants in whole proteome data is still not well known.To better understand the applicability of machine learning in this case, we implemented two such algorithms, adaptive boosting (AB) and repeated random forest (RRF), and developed a chunking layer that facilitates the analysis of whole proteome data. We first assessed the performance of these algorithms and tuned them on an influenza data set, for which the determinants of three complex phenotypes (infectivity, transmissibility, and pathogenicity) are known based on experimental evidence. This allowed us to show that chunking improves runtimes by an order of magnitude. Based on simulations, we showed that chunking also increases sensitivity of the predictions, reaching 100% with as few as 20 sequences in a small proteome as in the influenza case (5k sites), but may require at least 30 sequences to reach 90% on larger alignments (500k sites). While RRF has less specificity than random forest, it was never <50%, and RRF sensitivity was significantly higher at smaller chunk sizes. We then used these algorithms to predict the determinants of three types of drug resistance (to Ciprofloxacin, Ceftazidime, and Gentamicin) in a bacterium, Pseudomonas aeruginosa. While both algorithms performed well in the case of the influenza data, results were more nuanced in the bacterial case, with RRF making more sensible predictions, with smaller errors rates, than AB.Altogether, we demonstrated that ML algorithms can be used to identify genetic determinants in small proteomes (viruses), even when trained on small numbers of individuals. We further showed that our RRF algorithm may deserve more scrutiny, which should be facilitated by the decreasing costs of both sequencing and phenotyping of large cohorts of individuals." @default.
- W2952743539 created "2019-06-27" @default.
- W2952743539 creator A5009846007 @default.
- W2952743539 creator A5058662789 @default.
- W2952743539 creator A5068557724 @default.
- W2952743539 creator A5086215460 @default.
- W2952743539 date "2019-06-10" @default.
- W2952743539 modified "2023-10-10" @default.
- W2952743539 title "Identifying genetic determinants of complex phenotypes from whole genome sequence data" @default.
- W2952743539 cites W1518623077 @default.
- W2952743539 cites W1825722006 @default.
- W2952743539 cites W1907681383 @default.
- W2952743539 cites W197019882 @default.
- W2952743539 cites W1971447305 @default.
- W2952743539 cites W1976964063 @default.
- W2952743539 cites W1977004447 @default.
- W2952743539 cites W1979006554 @default.
- W2952743539 cites W1980082881 @default.
- W2952743539 cites W1980991473 @default.
- W2952743539 cites W1981267913 @default.
- W2952743539 cites W1988533731 @default.
- W2952743539 cites W2010996691 @default.
- W2952743539 cites W2018234381 @default.
- W2952743539 cites W2030096585 @default.
- W2952743539 cites W2042103448 @default.
- W2952743539 cites W2042726815 @default.
- W2952743539 cites W2045388510 @default.
- W2952743539 cites W2045934762 @default.
- W2952743539 cites W2046378278 @default.
- W2952743539 cites W2046548856 @default.
- W2952743539 cites W2049614475 @default.
- W2952743539 cites W2052297191 @default.
- W2952743539 cites W2054606441 @default.
- W2952743539 cites W2055043387 @default.
- W2952743539 cites W2055764609 @default.
- W2952743539 cites W2056036445 @default.
- W2952743539 cites W2056584399 @default.
- W2952743539 cites W2058263296 @default.
- W2952743539 cites W2060114298 @default.
- W2952743539 cites W2066617774 @default.
- W2952743539 cites W2067885219 @default.
- W2952743539 cites W2069322612 @default.
- W2952743539 cites W2069753640 @default.
- W2952743539 cites W2070575225 @default.
- W2952743539 cites W2079502771 @default.
- W2952743539 cites W2082893165 @default.
- W2952743539 cites W2087623083 @default.
- W2952743539 cites W2088338354 @default.
- W2952743539 cites W2091583677 @default.
- W2952743539 cites W2100483895 @default.
- W2952743539 cites W2101291993 @default.
- W2952743539 cites W2104634417 @default.
- W2952743539 cites W2108530765 @default.
- W2952743539 cites W2110214166 @default.
- W2952743539 cites W2112780049 @default.
- W2952743539 cites W2119400291 @default.
- W2952743539 cites W2124159817 @default.
- W2952743539 cites W2126927090 @default.
- W2952743539 cites W2127373523 @default.
- W2952743539 cites W2129420476 @default.
- W2952743539 cites W2131478115 @default.
- W2952743539 cites W2132926880 @default.
- W2952743539 cites W2134755636 @default.
- W2952743539 cites W2135695572 @default.
- W2952743539 cites W2142669967 @default.
- W2952743539 cites W2144015117 @default.
- W2952743539 cites W2148753657 @default.
- W2952743539 cites W2150387978 @default.
- W2952743539 cites W2151469110 @default.
- W2952743539 cites W2163924952 @default.
- W2952743539 cites W2163960578 @default.
- W2952743539 cites W2164527823 @default.
- W2952743539 cites W2170731906 @default.
- W2952743539 cites W2179324586 @default.
- W2952743539 cites W2184467923 @default.
- W2952743539 cites W2292216442 @default.
- W2952743539 cites W2335343645 @default.
- W2952743539 cites W2403922294 @default.
- W2952743539 cites W2481421840 @default.
- W2952743539 cites W2496911238 @default.
- W2952743539 cites W2502949459 @default.
- W2952743539 cites W2622688675 @default.
- W2952743539 cites W2725988230 @default.
- W2952743539 cites W2731142262 @default.
- W2952743539 cites W2739165387 @default.
- W2952743539 cites W2751873519 @default.
- W2952743539 cites W2785148091 @default.
- W2952743539 cites W2804541737 @default.
- W2952743539 cites W2950672524 @default.
- W2952743539 cites W347231809 @default.
- W2952743539 cites W4211042139 @default.
- W2952743539 cites W4232478844 @default.
- W2952743539 cites W4244207479 @default.
- W2952743539 cites W4292334794 @default.
- W2952743539 cites W4297944103 @default.
- W2952743539 doi "https://doi.org/10.1186/s12864-019-5820-0" @default.
- W2952743539 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/6558885" @default.
- W2952743539 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/31182025" @default.
- W2952743539 hasPublicationYear "2019" @default.
- W2952743539 type Work @default.