Matches in SemOpenAlex for { <https://semopenalex.org/work/W2007104492> ?p ?o ?g. }
Showing items 1 to 43 of
43
with 100 items per page.
- W2007104492 abstract "Cheminformatics has been defined as the application of informatics methods to solve chemical problems [1]. Such chemical problems are often represented in terms of data, be it activity data for a series of compounds or descriptor values for a compound library. While this new book from the O'Reilly stable is not aimed specifically at cheminformaticians, the subtitle of A Hands-On Guide for Programmers and Scientists makes it clear that the target audience includes any scientists whose day-to-day work involves analysing and interpreting data.The book is broadly divided into four parts on Graphics: Looking at Data, Analytics: Modeling Data, Computation: and Applications: Using Data. First of all, it should be noted that this is not a book about (as Chapter 1 states explicitly). Neither is it a manual for numpy, Sage, matplotlib, Gnuplot, R and so forth, as might be implied by the title. Instead, Janert focuses on discussing data analysis methods and techniques in depth, rather than skimming topics by following a cookbook or tutorial approach linked to particular software. This is as it should be - there are already documentation and manuals available for all of these programs, and the reader is simply alerted to the availability of the software, its capabilities are described and some examples of use shown.This is a real practitioner's book. Janert, a former physicist and software engineer, is a consultant in data analysis and mathematical modelling. He has taken his hard-won knowledge and tried to get it all down on paper for the reader's benefit. For example, in a chapter with the provocative title of What really need to know about classical statistics he explains why introductory textbooks seem to cover methods and topics at odds with the problems data analysts deal with day-to-day; essentially classical methods were developed at a time of small and expensive datasets and no computational power, and hypothesis testing focused on determining whether an effect existed. Today we have ample computing power and may be dealing with very large datasets; also, we are usually more interested in the size of an effect (practical significance) rather than just whether it exists (statistical significance).Topics that could not be squeezed into a chapter proper have been placed in shorter Intermezzos at the end of each section. For example, a short section on What about map/reduce? at the end of Mining Data reminds the reader that the map/reduce methodology (much hyped recently) is not a clever algorithm to speed things up, but rather a piece of infrastructure that makes it convenient to implement algorithms that are trivially parallelisable.On the negative side, any cheminformatician who has been involved with QSAR studies will already be familiar with the multivariate analysis methods discussed here (Chapters 13 and 14), although I liked the observation that you will actually spend more time on data sets that are totally worthless in relation to clustering algorithms. Also there are two chapters (out of 19) which will be of little interest as they focus on business intelligence and financial calculations, although even there the reader will find an introduction to the use of Berkeley DB and SQLite from Python, tools which I highly recommend. There are also cases where the author perhaps gives too much detail, but this is hardly a criticism - in a book of some 500 pages there is plenty of room.Overall though, I heartily recommend this book to anyone working in cheminformatics whether they develop methods or apply them. Too often we rely on summary such as mean and standard deviation and forget to actually look at the data. Graphical analysis gives a feel for the data, and can often highlight problems, interesting features, or mistaken assumptions. After reading this book, should be very aware of both the advantages and pitfalls of a wide variety of analysis methods but will also be reminded that the goal of data analysis is not a picture or a number but insight." @default.
- W2007104492 created "2016-06-24" @default.
- W2007104492 creator A5036721569 @default.
- W2007104492 date "2011-03-24" @default.
- W2007104492 modified "2023-09-26" @default.
- W2007104492 title "Review of Data Analysis with Open Source Tools by Philipp K Janert" @default.
- W2007104492 cites W4206527087 @default.
- W2007104492 doi "https://doi.org/10.1186/1758-2946-3-10" @default.
- W2007104492 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/3072350" @default.
- W2007104492 hasPublicationYear "2011" @default.
- W2007104492 type Work @default.
- W2007104492 sameAs 2007104492 @default.
- W2007104492 citedByCount "3" @default.
- W2007104492 countsByYear W20071044922017 @default.
- W2007104492 countsByYear W20071044922018 @default.
- W2007104492 countsByYear W20071044922022 @default.
- W2007104492 crossrefType "journal-article" @default.
- W2007104492 hasAuthorship W2007104492A5036721569 @default.
- W2007104492 hasBestOaLocation W20071044921 @default.
- W2007104492 hasConcept C41008148 @default.
- W2007104492 hasConceptScore W2007104492C41008148 @default.
- W2007104492 hasIssue "1" @default.
- W2007104492 hasLocation W20071044921 @default.
- W2007104492 hasLocation W20071044922 @default.
- W2007104492 hasLocation W20071044923 @default.
- W2007104492 hasLocation W20071044924 @default.
- W2007104492 hasOpenAccess W2007104492 @default.
- W2007104492 hasPrimaryLocation W20071044921 @default.
- W2007104492 hasRelatedWork W1596801655 @default.
- W2007104492 hasRelatedWork W2130043461 @default.
- W2007104492 hasRelatedWork W2350741829 @default.
- W2007104492 hasRelatedWork W2358668433 @default.
- W2007104492 hasRelatedWork W2376932109 @default.
- W2007104492 hasRelatedWork W2382290278 @default.
- W2007104492 hasRelatedWork W2390279801 @default.
- W2007104492 hasRelatedWork W2748952813 @default.
- W2007104492 hasRelatedWork W2899084033 @default.
- W2007104492 hasRelatedWork W2530322880 @default.
- W2007104492 hasVolume "3" @default.
- W2007104492 isParatext "false" @default.
- W2007104492 isRetracted "false" @default.
- W2007104492 magId "2007104492" @default.
- W2007104492 workType "article" @default.