Matches in SemOpenAlex for { <https://semopenalex.org/work/W19269568> ?p ?o ?g. }
- W19269568 abstract "Scientific discovery in data rich domains (e.g., biological sciences, atmospheric sciences) presents several challenges in information extraction and knowledge acquisition from heterogeneous, distributed, autonomously operated, dynamic data sources. This paper describes these problems and outlines the key elements of algorithmic and systems solutions for computer assisted scientific discovery in such domains. These include: ontology-assisted approaches to customizable data integration and information extraction from heterogeneous, distributed data sources; distributed data mining algorithms for knowledge acquisition from large, distributed data sets which obviate the need for transmitting large volumes of data across the network; ontology-driven approaches to exploratory data analysis from alternative ontological perspectives; and modular and extensible agent-based implementations of the algorithms within a platform-independent agent infrastructure. Prototype implementations of the proposed system are being used for discovery of macromolecular structure-function relationships in computational biology and distributed coordinated intrusion detection in computer networks. Challenges in Integration and Analysis of Heterogeneous Distributed Data Development of high throughput data acquisition technologies in biological sciences, together with advances in digital storage, computing, and communications technologies have resulted in unprecedented opportunities for large scale, computer assisted, data-driven scientific discovery [Baxevanis et al., 1999]. Data sets of interest to computational biologists are often heterogeneous in structure, content, and semantics. Examples include sequence data (DNA, RNA, and protein sequences, expressed sequence tags) [Benson et al., 1997; Boguski et al., 1997]; numeric measurements (e.g., gene expression data); symbolic data describing relations among entities; structured or semi-structured text (e.g., annotations associated with DNA sequences, protein structures, and gene expression data); temporal data (e.g., gene expression time series); structures containing numeric as well as symbolic information (e.g., 3-dimensional protein structures); and results of various types of analysis [Baxevanis, 2000; Discala et al., 2000]. They currently include data stored in flat files, relational databases, and object-oriented databases. The term biological database is used loosely to refer to a biological data collection in any of these forms. How best to organize genome data is still a matter of debate [Frenkel, 1991; Gelbart, 1998] although several objectoriented databases and have been proposed in recent years [Gray, 1990; Goodman, 1995; Ghosh, 1999; Durbin, 1991]. Applications such as characterization of macromolecular structure function relationships and inference of genetic regulatory pathways require selection and extraction of relevant information from such data (e.g., features from sequences, counts and statistical summaries from measurements, structured representation of relevant information from textual annotations). They also call for data integration from multiple sources into a coherent form that lends itself to further analysis (e.g., data mining) by bridging syntactic and semantic gaps among them. Typical data analysis tasks that arise in computational biology are difficult to express using standard query languages and thus application programs have to be constructed using program libraries. While queries expressed in declarative languages like SQL are still useful in biological databases, the use of programming interfaces is unavoidable for many types of data analysis (e.g., data mining). This follows from the fact that the same set of data may have to be analyzed in different ways depending on the information extraction and knowledge acquisition objectives of the user. It is impossible to foresee all the potential uses of data when designing data repositories or data analysis services. The data sources of interest in computational molecular biology are large, diverse in structure and content, and typically autonomously maintained [Fasman, 1994]. Transforming these data into useful knowledge (e.g., inference of genetic networks from gene expression data, building predictive models of protein function from protein sequence) calls for algorithmic and systems solutions for computer assisted knowledge acquisition and data and knowledge visualization. Machine learning algorithms [Mitchell, 1997] currently offer one of the most cost effective approaches to data-driven knowledge acquisition (discovery of features, correlations, and other complex relationships and hypotheses that describe potentially interesting regularities from large data sets) in increasingly data rich domains such as computational biology [Baldi and Brunak, 1998]. However, application of machine learning algorithms to large scale knowledge discovery from" @default.
- W19269568 created "2016-06-24" @default.
- W19269568 creator A5004737962 @default.
- W19269568 creator A5045214155 @default.
- W19269568 creator A5050678997 @default.
- W19269568 creator A5067341711 @default.
- W19269568 date "2002-01-01" @default.
- W19269568 modified "2023-09-30" @default.
- W19269568 title "Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous, Distributed, Autonomous Biological Data Sources" @default.
- W19269568 cites W1498183065 @default.
- W19269568 cites W1503378001 @default.
- W19269568 cites W1516346799 @default.
- W19269568 cites W1540371141 @default.
- W19269568 cites W1541937697 @default.
- W19269568 cites W156305594 @default.
- W19269568 cites W1582401051 @default.
- W19269568 cites W1594031697 @default.
- W19269568 cites W1601142477 @default.
- W19269568 cites W1605533097 @default.
- W19269568 cites W1638026557 @default.
- W19269568 cites W1763728792 @default.
- W19269568 cites W1824433965 @default.
- W19269568 cites W1933674367 @default.
- W19269568 cites W1977746237 @default.
- W19269568 cites W1981596122 @default.
- W19269568 cites W1988790447 @default.
- W19269568 cites W1993430431 @default.
- W19269568 cites W1994557482 @default.
- W19269568 cites W2000730583 @default.
- W19269568 cites W2028991148 @default.
- W19269568 cites W2035044274 @default.
- W19269568 cites W2035923305 @default.
- W19269568 cites W2041956242 @default.
- W19269568 cites W2060462217 @default.
- W19269568 cites W2061282752 @default.
- W19269568 cites W2071427873 @default.
- W19269568 cites W2079191620 @default.
- W19269568 cites W2080402044 @default.
- W19269568 cites W2084981891 @default.
- W19269568 cites W2093825590 @default.
- W19269568 cites W2099552479 @default.
- W19269568 cites W2103017472 @default.
- W19269568 cites W2105941683 @default.
- W19269568 cites W2108874748 @default.
- W19269568 cites W2111692891 @default.
- W19269568 cites W2129919467 @default.
- W19269568 cites W2138745909 @default.
- W19269568 cites W2139060750 @default.
- W19269568 cites W2149706766 @default.
- W19269568 cites W2158030545 @default.
- W19269568 cites W2201622286 @default.
- W19269568 cites W2530153978 @default.
- W19269568 cites W2799002609 @default.
- W19269568 cites W2915003108 @default.
- W19269568 cites W3021734875 @default.
- W19269568 cites W379208378 @default.
- W19269568 cites W64662365 @default.
- W19269568 hasPublicationYear "2002" @default.
- W19269568 type Work @default.
- W19269568 sameAs 19269568 @default.
- W19269568 citedByCount "17" @default.
- W19269568 countsByYear W192695682013 @default.
- W19269568 countsByYear W192695682014 @default.
- W19269568 countsByYear W192695682015 @default.
- W19269568 countsByYear W192695682020 @default.
- W19269568 crossrefType "journal-article" @default.
- W19269568 hasAuthorship W19269568A5004737962 @default.
- W19269568 hasAuthorship W19269568A5045214155 @default.
- W19269568 hasAuthorship W19269568A5050678997 @default.
- W19269568 hasAuthorship W19269568A5067341711 @default.
- W19269568 hasConcept C101468663 @default.
- W19269568 hasConcept C111472728 @default.
- W19269568 hasConcept C115903868 @default.
- W19269568 hasConcept C120567893 @default.
- W19269568 hasConcept C124101348 @default.
- W19269568 hasConcept C138885662 @default.
- W19269568 hasConcept C199360897 @default.
- W19269568 hasConcept C201797286 @default.
- W19269568 hasConcept C23123220 @default.
- W19269568 hasConcept C2522767166 @default.
- W19269568 hasConcept C25810664 @default.
- W19269568 hasConcept C26713055 @default.
- W19269568 hasConcept C41008148 @default.
- W19269568 hasConcept C60644358 @default.
- W19269568 hasConcept C72634772 @default.
- W19269568 hasConcept C86803240 @default.
- W19269568 hasConceptScore W19269568C101468663 @default.
- W19269568 hasConceptScore W19269568C111472728 @default.
- W19269568 hasConceptScore W19269568C115903868 @default.
- W19269568 hasConceptScore W19269568C120567893 @default.
- W19269568 hasConceptScore W19269568C124101348 @default.
- W19269568 hasConceptScore W19269568C138885662 @default.
- W19269568 hasConceptScore W19269568C199360897 @default.
- W19269568 hasConceptScore W19269568C201797286 @default.
- W19269568 hasConceptScore W19269568C23123220 @default.
- W19269568 hasConceptScore W19269568C2522767166 @default.
- W19269568 hasConceptScore W19269568C25810664 @default.
- W19269568 hasConceptScore W19269568C26713055 @default.
- W19269568 hasConceptScore W19269568C41008148 @default.
- W19269568 hasConceptScore W19269568C60644358 @default.