Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313826095> ?p ?o ?g. }
- W4313826095 endingPage "105" @default.
- W4313826095 startingPage "105" @default.
- W4313826095 abstract "Computational methods for creating in silico libraries of molecular descriptors (e.g., collision cross sections) are becoming increasingly prevalent due to the limited number of authentic reference materials available for traditional library building. These so-called reference-free metabolomics methods require sampling sets of molecular conformers in order to produce high accuracy property predictions. Due to the computational cost of the subsequent calculations for each conformer, there is a need to sample the most relevant subset and avoid repeating calculations on conformers that are nearly identical. The goal of this study is to introduce a heuristic method of finding the most dissimilar conformers from a larger population in order to help speed up reference-free calculation methods and maintain a high property prediction accuracy. Finding the set of the n items most dissimilar from each other out of a larger population becomes increasingly difficult and computationally expensive as either n or the population size grows large. Because there exists a pairwise relationship between each item and all other items in the population, finding the set of the n most dissimilar items is different than simply sorting an array of numbers. For instance, if you have a set of the most dissimilar n = 4 items, one or more of the items from n = 4 might not be in the set n = 5. An exact solution would have to search all possible combinations of size n in the population exhaustively. We present an open-source software called similarity downselection (SDS), written in Python and freely available on GitHub. SDS implements a heuristic algorithm for quickly finding the approximate set(s) of the n most dissimilar items. We benchmark SDS against a Monte Carlo method, which attempts to find the exact solution through repeated random sampling. We show that for SDS to find the set of n most dissimilar conformers, our method is not only orders of magnitude faster, but it is also more accurate than running Monte Carlo for 1,000,000 iterations, each searching for set sizes n = 3-7 out of a population of 50,000. We also benchmark SDS against the exact solution for example small populations, showing that SDS produces a solution close to the exact solution in these instances. Using theoretical approaches, we also demonstrate the constraints of the greedy algorithm and its efficacy as a ratio to the exact solution." @default.
- W4313826095 created "2023-01-09" @default.
- W4313826095 creator A5001785077 @default.
- W4313826095 creator A5034592694 @default.
- W4313826095 creator A5046319147 @default.
- W4313826095 creator A5051101756 @default.
- W4313826095 creator A5071088289 @default.
- W4313826095 creator A5085367465 @default.
- W4313826095 date "2023-01-09" @default.
- W4313826095 modified "2023-09-25" @default.
- W4313826095 title "Similarity Downselection: Finding the n Most Dissimilar Molecular Conformers for Reference-Free Metabolomics" @default.
- W4313826095 cites W1976848779 @default.
- W4313826095 cites W1991805259 @default.
- W4313826095 cites W1999068031 @default.
- W4313826095 cites W2009588798 @default.
- W4313826095 cites W2010890795 @default.
- W4313826095 cites W2041818766 @default.
- W4313826095 cites W2044024461 @default.
- W4313826095 cites W2079439262 @default.
- W4313826095 cites W2088387982 @default.
- W4313826095 cites W2114326383 @default.
- W4313826095 cites W2161160262 @default.
- W4313826095 cites W2164719474 @default.
- W4313826095 cites W2169678694 @default.
- W4313826095 cites W2520303620 @default.
- W4313826095 cites W2601948253 @default.
- W4313826095 cites W2789302726 @default.
- W4313826095 cites W2911354688 @default.
- W4313826095 cites W2947899858 @default.
- W4313826095 cites W2963877232 @default.
- W4313826095 cites W3006234984 @default.
- W4313826095 cites W3026685272 @default.
- W4313826095 cites W3130080648 @default.
- W4313826095 cites W3146454937 @default.
- W4313826095 doi "https://doi.org/10.3390/metabo13010105" @default.
- W4313826095 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36677030" @default.
- W4313826095 hasPublicationYear "2023" @default.
- W4313826095 type Work @default.
- W4313826095 citedByCount "0" @default.
- W4313826095 crossrefType "journal-article" @default.
- W4313826095 hasAuthorship W4313826095A5001785077 @default.
- W4313826095 hasAuthorship W4313826095A5034592694 @default.
- W4313826095 hasAuthorship W4313826095A5046319147 @default.
- W4313826095 hasAuthorship W4313826095A5051101756 @default.
- W4313826095 hasAuthorship W4313826095A5071088289 @default.
- W4313826095 hasAuthorship W4313826095A5085367465 @default.
- W4313826095 hasBestOaLocation W43138260951 @default.
- W4313826095 hasConcept C103278499 @default.
- W4313826095 hasConcept C111919701 @default.
- W4313826095 hasConcept C115961682 @default.
- W4313826095 hasConcept C124101348 @default.
- W4313826095 hasConcept C144024400 @default.
- W4313826095 hasConcept C149923435 @default.
- W4313826095 hasConcept C154945302 @default.
- W4313826095 hasConcept C177264268 @default.
- W4313826095 hasConcept C178790620 @default.
- W4313826095 hasConcept C184898388 @default.
- W4313826095 hasConcept C185592680 @default.
- W4313826095 hasConcept C18705241 @default.
- W4313826095 hasConcept C199360897 @default.
- W4313826095 hasConcept C2777904410 @default.
- W4313826095 hasConcept C2908647359 @default.
- W4313826095 hasConcept C32909587 @default.
- W4313826095 hasConcept C41008148 @default.
- W4313826095 hasConcept C519991488 @default.
- W4313826095 hasConceptScore W4313826095C103278499 @default.
- W4313826095 hasConceptScore W4313826095C111919701 @default.
- W4313826095 hasConceptScore W4313826095C115961682 @default.
- W4313826095 hasConceptScore W4313826095C124101348 @default.
- W4313826095 hasConceptScore W4313826095C144024400 @default.
- W4313826095 hasConceptScore W4313826095C149923435 @default.
- W4313826095 hasConceptScore W4313826095C154945302 @default.
- W4313826095 hasConceptScore W4313826095C177264268 @default.
- W4313826095 hasConceptScore W4313826095C178790620 @default.
- W4313826095 hasConceptScore W4313826095C184898388 @default.
- W4313826095 hasConceptScore W4313826095C185592680 @default.
- W4313826095 hasConceptScore W4313826095C18705241 @default.
- W4313826095 hasConceptScore W4313826095C199360897 @default.
- W4313826095 hasConceptScore W4313826095C2777904410 @default.
- W4313826095 hasConceptScore W4313826095C2908647359 @default.
- W4313826095 hasConceptScore W4313826095C32909587 @default.
- W4313826095 hasConceptScore W4313826095C41008148 @default.
- W4313826095 hasConceptScore W4313826095C519991488 @default.
- W4313826095 hasFunder F4320337361 @default.
- W4313826095 hasIssue "1" @default.
- W4313826095 hasLocation W43138260951 @default.
- W4313826095 hasLocation W43138260952 @default.
- W4313826095 hasLocation W43138260953 @default.
- W4313826095 hasOpenAccess W4313826095 @default.
- W4313826095 hasPrimaryLocation W43138260951 @default.
- W4313826095 hasRelatedWork W2088791420 @default.
- W4313826095 hasRelatedWork W2148556617 @default.
- W4313826095 hasRelatedWork W2327204559 @default.
- W4313826095 hasRelatedWork W2569300915 @default.
- W4313826095 hasRelatedWork W2587671147 @default.
- W4313826095 hasRelatedWork W2623240261 @default.
- W4313826095 hasRelatedWork W2935352645 @default.
- W4313826095 hasRelatedWork W3129254793 @default.