Matches in SemOpenAlex for { <https://semopenalex.org/work/W2126829482> ?p ?o ?g. }
- W2126829482 endingPage "2109" @default.
- W2126829482 startingPage "2098" @default.
- W2126829482 abstract "Many modern chemoinformatics systems for small molecules rely on large fingerprint vector representations, where the components of the vector record the presence or number of occurrences in the molecular graphs of particular combinatorial features, such as labeled paths or labeled trees. These large fingerprint vectors are often compressed to much shorter fingerprint vectors using a lossy compression scheme based on a simple modulo procedure. Here, we combine statistical models of fingerprints with integer entropy codes, such as Golomb and Elias codes, to encode the indices or the run lengths of the fingerprints. After reordering the fingerprint components by decreasing frequency order, the indices are monotone-increasing and the run lengths are quasi-monotone-increasing, and both exhibit power-law distribution trends. We take advantage of these statistical properties to derive new efficient, lossless, compression algorithms for monotone integer sequences: monotone value (MOV) coding and monotone length (MOL) coding. In contrast to lossy systems that use 1024 or more bits of storage per molecule, we can achieve lossless compression of long chemical fingerprints based on circular substructures in slightly over 300 bits per molecule, close to the Shannon entropy limit, using a MOL Elias Gamma code for run lengths. The improvement in storage comes at a modest computational cost. Furthermore, because the compression is lossless, uncompressed similarity (e.g., Tanimoto) between molecules can be computed exactly from their compressed representations, leading to significant improvements in retrival performance, as shown on six benchmark data sets of druglike molecules." @default.
- W2126829482 created "2016-06-24" @default.
- W2126829482 creator A5003755911 @default.
- W2126829482 creator A5033171266 @default.
- W2126829482 creator A5086610385 @default.
- W2126829482 creator A5088813478 @default.
- W2126829482 date "2007-10-30" @default.
- W2126829482 modified "2023-09-24" @default.
- W2126829482 title "Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval" @default.
- W2126829482 cites W1969462276 @default.
- W2126829482 cites W1975447964 @default.
- W2126829482 cites W2017254234 @default.
- W2126829482 cites W2035292911 @default.
- W2126829482 cites W2059975159 @default.
- W2126829482 cites W2060097376 @default.
- W2126829482 cites W2112912103 @default.
- W2126829482 cites W2136772235 @default.
- W2126829482 cites W2137262074 @default.
- W2126829482 cites W2148293970 @default.
- W2126829482 cites W2155741020 @default.
- W2126829482 cites W2160114756 @default.
- W2126829482 cites W2200810672 @default.
- W2126829482 cites W2295528854 @default.
- W2126829482 doi "https://doi.org/10.1021/ci700200n" @default.
- W2126829482 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/2536658" @default.
- W2126829482 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/17967006" @default.
- W2126829482 hasPublicationYear "2007" @default.
- W2126829482 type Work @default.
- W2126829482 sameAs 2126829482 @default.
- W2126829482 citedByCount "49" @default.
- W2126829482 countsByYear W21268294822012 @default.
- W2126829482 countsByYear W21268294822013 @default.
- W2126829482 countsByYear W21268294822014 @default.
- W2126829482 countsByYear W21268294822015 @default.
- W2126829482 countsByYear W21268294822016 @default.
- W2126829482 countsByYear W21268294822017 @default.
- W2126829482 countsByYear W21268294822018 @default.
- W2126829482 countsByYear W21268294822019 @default.
- W2126829482 countsByYear W21268294822020 @default.
- W2126829482 crossrefType "journal-article" @default.
- W2126829482 hasAuthorship W2126829482A5003755911 @default.
- W2126829482 hasAuthorship W2126829482A5033171266 @default.
- W2126829482 hasAuthorship W2126829482A5086610385 @default.
- W2126829482 hasAuthorship W2126829482A5088813478 @default.
- W2126829482 hasBestOaLocation W21268294822 @default.
- W2126829482 hasConcept C106301342 @default.
- W2126829482 hasConcept C11413529 @default.
- W2126829482 hasConcept C115961682 @default.
- W2126829482 hasConcept C118615104 @default.
- W2126829482 hasConcept C121332964 @default.
- W2126829482 hasConcept C13481523 @default.
- W2126829482 hasConcept C154945302 @default.
- W2126829482 hasConcept C165021410 @default.
- W2126829482 hasConcept C1769480 @default.
- W2126829482 hasConcept C33923547 @default.
- W2126829482 hasConcept C41008148 @default.
- W2126829482 hasConcept C62520636 @default.
- W2126829482 hasConcept C66656319 @default.
- W2126829482 hasConcept C78548338 @default.
- W2126829482 hasConcept C81081738 @default.
- W2126829482 hasConcept C9417928 @default.
- W2126829482 hasConceptScore W2126829482C106301342 @default.
- W2126829482 hasConceptScore W2126829482C11413529 @default.
- W2126829482 hasConceptScore W2126829482C115961682 @default.
- W2126829482 hasConceptScore W2126829482C118615104 @default.
- W2126829482 hasConceptScore W2126829482C121332964 @default.
- W2126829482 hasConceptScore W2126829482C13481523 @default.
- W2126829482 hasConceptScore W2126829482C154945302 @default.
- W2126829482 hasConceptScore W2126829482C165021410 @default.
- W2126829482 hasConceptScore W2126829482C1769480 @default.
- W2126829482 hasConceptScore W2126829482C33923547 @default.
- W2126829482 hasConceptScore W2126829482C41008148 @default.
- W2126829482 hasConceptScore W2126829482C62520636 @default.
- W2126829482 hasConceptScore W2126829482C66656319 @default.
- W2126829482 hasConceptScore W2126829482C78548338 @default.
- W2126829482 hasConceptScore W2126829482C81081738 @default.
- W2126829482 hasConceptScore W2126829482C9417928 @default.
- W2126829482 hasIssue "6" @default.
- W2126829482 hasLocation W21268294821 @default.
- W2126829482 hasLocation W21268294822 @default.
- W2126829482 hasLocation W21268294823 @default.
- W2126829482 hasLocation W21268294824 @default.
- W2126829482 hasOpenAccess W2126829482 @default.
- W2126829482 hasPrimaryLocation W21268294821 @default.
- W2126829482 hasRelatedWork W1480872254 @default.
- W2126829482 hasRelatedWork W1489137 @default.
- W2126829482 hasRelatedWork W1519127572 @default.
- W2126829482 hasRelatedWork W1983052329 @default.
- W2126829482 hasRelatedWork W2083676471 @default.
- W2126829482 hasRelatedWork W2096168583 @default.
- W2126829482 hasRelatedWork W2153638435 @default.
- W2126829482 hasRelatedWork W2155161695 @default.
- W2126829482 hasRelatedWork W2159351027 @default.
- W2126829482 hasRelatedWork W2231051939 @default.
- W2126829482 hasVolume "47" @default.
- W2126829482 isParatext "false" @default.
- W2126829482 isRetracted "false" @default.
- W2126829482 magId "2126829482" @default.