Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295217226> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W4295217226 abstract "The rise in demand for more FAIR (Findable, Accessible, Interoperable, and Reusable) data is being answered by increasingly automated ways to capture, process, publish and register biodiversity datasets. Coupled with the increasing possibilities for detecting hundreds of species in a single sample/event (i.e., eDNA), this results in taxonomic information that is multiple levels of magnitude higher than it was a couple of years ago. This spike in content has an adverse effect on the ability of researchers to find relevant datasets within catalogues, due to the limitations in storing and displaying the taxonomic metadata (e.g., in the real-estate of a webpage, in the timeframe required to access the quantity of information, in displaying information to users in a comprehensive and comprehensible way). WoRMS (World Register of Marine Species) is a taxonomic backbone that provides species information. One user of WoRMS is the Integrated Marine Information System (IMIS), a metadata catalogue for marine data, which is also the metadata catalogue of the European node of the Ocean Biodiversity Information System (EurOBIS). Taxonomic information added to the metadata records in IMIS are linked to WoRMS Persitent Identifiers or PIDs (AphiaIDs). Tension between providing all the taxonomic metadata while not overloading the catalogue is being addressed for the use-case of WoRMS+IMIS+EurOBIS. Our approach is to apply a filter-and-replace algorithm during the automated registration of the taxonomic metadata to describe available datasets. This technique reduces the detailed taxonomic information of actual occurrences in the dataset content into practical (good enough) metadata. It takes as input all the species in the dataset along with their hierarchical structure, as well as a configuration parameter allowing for an upper bound to the acceptable number of taxa to be output. The core principle of the algorithm is to start off with the minimal result set containing only the hierarchical root (always biota) of the complete taxonomy in the dataset, and then to gradually consider replacing each element with its children one level deeper, as long as that replacement keeps fitting the upper bound for the total set. This approach ensures that no coverage is lost, meaning every taxon in the actual dataset is represented in the result, although possibly through one of its parents, X layers up. Note that as long as only one child is underlying, the switch will always happen. So by nature, it will go down to the lowest relevant detail without challenging the upper bound limit. It also allows for variation in the actual processing by allowing for different ordering strategies on the current result-set. Ordering strategies under test are: Order (descending) by weight of underlying available children → favouring more detail in those parts of the tree that have the most members in the set With weight defined as the count of all underlying available species, OR Weight defined as the sum-product of those species with their actual occurrences in the dataset (thus further favouring detail to those parts of the tree that are more prevalent in the samples) Order (descending) by number of direct children → favouring fanning-out over available high-level siblings Order (ascending) by ratio of present children over available children in the taxa → favouring replacing too vague parents with the more specific sublevels that are actually in the dataset Order (descending) by weight of underlying available children → favouring more detail in those parts of the tree that have the most members in the set With weight defined as the count of all underlying available species, OR Weight defined as the sum-product of those species with their actual occurrences in the dataset (thus further favouring detail to those parts of the tree that are more prevalent in the samples) With weight defined as the count of all underlying available species, OR Weight defined as the sum-product of those species with their actual occurrences in the dataset (thus further favouring detail to those parts of the tree that are more prevalent in the samples) Order (descending) by number of direct children → favouring fanning-out over available high-level siblings Order (ascending) by ratio of present children over available children in the taxa → favouring replacing too vague parents with the more specific sublevels that are actually in the dataset Alongside this basic approach, additional pre-processing of the taxa in the dataset can apply some form of pruning. In this approach all nodes in the available taxon tree of the dataset are ordered (descending) by the weight of underlying children, and those at the end—below some defined cutoff ratio (extra parameter to the algorithm)—are simply discarded. Again, the interpretation of this weight (only species count, or multiplied by occurrence count) yields to variants of the algorithm to be tested. Applying this pruning means a deliberate departure is taken from the full-coverage guarantee mentioned earlier. Caution should be applied, of course, but removing more irrelevant (low occurrence) parts of the tree will allow for making space in the bounded result-set for more detail in the parts that are relevant. In order to create an objective basis for comparing the resulting variants, a number of qualification parameters are considered to quantify the effect of the suggested reduction. Based on the datasets in EurOBIS, the variants of this algorithm are being applied and results will be presented on how they affect the various qualification parameters. It is worth observing that both datasets and their metadata records are distinct resources that are being linked to species (taxa references) and that the different purposes they serve require different levels of detail to be presented. As other types of entities (publications, habitats, experts, geography, traits, etc.) are considered for linking to species, we believe similar reduction algorithms will be necessary." @default.
- W4295217226 created "2022-09-12" @default.
- W4295217226 creator A5029849662 @default.
- W4295217226 creator A5035487276 @default.
- W4295217226 creator A5040337361 @default.
- W4295217226 creator A5074084744 @default.
- W4295217226 creator A5074220278 @default.
- W4295217226 creator A5074768260 @default.
- W4295217226 creator A5077230308 @default.
- W4295217226 date "2022-09-09" @default.
- W4295217226 modified "2023-09-26" @default.
- W4295217226 title "An Algorithmic Approach to Reducing Taxonomic Detail from Actual Datasets to their Metadata Representation to Increase Findabilty" @default.
- W4295217226 doi "https://doi.org/10.3897/biss.6.94603" @default.
- W4295217226 hasPublicationYear "2022" @default.
- W4295217226 type Work @default.
- W4295217226 citedByCount "0" @default.
- W4295217226 crossrefType "journal-article" @default.
- W4295217226 hasAuthorship W4295217226A5029849662 @default.
- W4295217226 hasAuthorship W4295217226A5035487276 @default.
- W4295217226 hasAuthorship W4295217226A5040337361 @default.
- W4295217226 hasAuthorship W4295217226A5074084744 @default.
- W4295217226 hasAuthorship W4295217226A5074220278 @default.
- W4295217226 hasAuthorship W4295217226A5074768260 @default.
- W4295217226 hasAuthorship W4295217226A5077230308 @default.
- W4295217226 hasBestOaLocation W42952172261 @default.
- W4295217226 hasConcept C112698675 @default.
- W4295217226 hasConcept C124101348 @default.
- W4295217226 hasConcept C130217890 @default.
- W4295217226 hasConcept C136764020 @default.
- W4295217226 hasConcept C136976847 @default.
- W4295217226 hasConcept C144133560 @default.
- W4295217226 hasConcept C153048206 @default.
- W4295217226 hasConcept C154504017 @default.
- W4295217226 hasConcept C18903297 @default.
- W4295217226 hasConcept C189592816 @default.
- W4295217226 hasConcept C193150823 @default.
- W4295217226 hasConcept C199360897 @default.
- W4295217226 hasConcept C20136886 @default.
- W4295217226 hasConcept C23123220 @default.
- W4295217226 hasConcept C30872290 @default.
- W4295217226 hasConcept C41008148 @default.
- W4295217226 hasConcept C41458344 @default.
- W4295217226 hasConcept C71640776 @default.
- W4295217226 hasConcept C86803240 @default.
- W4295217226 hasConcept C93518851 @default.
- W4295217226 hasConceptScore W4295217226C112698675 @default.
- W4295217226 hasConceptScore W4295217226C124101348 @default.
- W4295217226 hasConceptScore W4295217226C130217890 @default.
- W4295217226 hasConceptScore W4295217226C136764020 @default.
- W4295217226 hasConceptScore W4295217226C136976847 @default.
- W4295217226 hasConceptScore W4295217226C144133560 @default.
- W4295217226 hasConceptScore W4295217226C153048206 @default.
- W4295217226 hasConceptScore W4295217226C154504017 @default.
- W4295217226 hasConceptScore W4295217226C18903297 @default.
- W4295217226 hasConceptScore W4295217226C189592816 @default.
- W4295217226 hasConceptScore W4295217226C193150823 @default.
- W4295217226 hasConceptScore W4295217226C199360897 @default.
- W4295217226 hasConceptScore W4295217226C20136886 @default.
- W4295217226 hasConceptScore W4295217226C23123220 @default.
- W4295217226 hasConceptScore W4295217226C30872290 @default.
- W4295217226 hasConceptScore W4295217226C41008148 @default.
- W4295217226 hasConceptScore W4295217226C41458344 @default.
- W4295217226 hasConceptScore W4295217226C71640776 @default.
- W4295217226 hasConceptScore W4295217226C86803240 @default.
- W4295217226 hasConceptScore W4295217226C93518851 @default.
- W4295217226 hasLocation W42952172261 @default.
- W4295217226 hasLocation W42952172262 @default.
- W4295217226 hasLocation W42952172263 @default.
- W4295217226 hasOpenAccess W4295217226 @default.
- W4295217226 hasPrimaryLocation W42952172261 @default.
- W4295217226 hasRelatedWork W1529535723 @default.
- W4295217226 hasRelatedWork W1534949906 @default.
- W4295217226 hasRelatedWork W1864352914 @default.
- W4295217226 hasRelatedWork W196221469 @default.
- W4295217226 hasRelatedWork W2071978624 @default.
- W4295217226 hasRelatedWork W2147875950 @default.
- W4295217226 hasRelatedWork W2151972543 @default.
- W4295217226 hasRelatedWork W2377447660 @default.
- W4295217226 hasRelatedWork W2991657003 @default.
- W4295217226 hasRelatedWork W1888439473 @default.
- W4295217226 hasVolume "6" @default.
- W4295217226 isParatext "false" @default.
- W4295217226 isRetracted "false" @default.
- W4295217226 workType "article" @default.