Book Review: Data Analysis With Open Source Tools by Philipp K. Janert

Data is being produced by megabytes, gigabytes, daily. What do you do with all that information? How do you turn it into profitable knowledge? This thorny problem is what Philipp K. Janert's Data Analysis with Open Source Tools is written to help a reader tackle by helping to orient him or her in the field of general data analysis in the business environment. The book discusses ways of investigating data in order to recover structures responsible for it, capture those structures into models and share the model implications with the organization through business plans, metric dashboards and other methods.

While the book is a great conceptual outline of the process of data-analysis, it is not for beginners, lacking in worked examples or a more granular discussion of a textbook. The book is written primarily for programmers, readers who have the ability to take a concept and implement it in a programming language of their own choosing, and so it is for those who have already a certain level of proficiency in analyzing problems and thinking analytically. In the case of kernel density estimate and splines, for example, there is only a discussion of the formula, but no example implementation, as the reader is assumed to be able to do this on his own.

The book has four sections: graphing data, modeling data, mining data and using data. It discusses a number of open source data tools such as R, Sage and Python in the “Workshop” sections following each chapter. The workshops are meant to explore the purpose of various open source tools and libraries, and Janert discusses the architecture of tool libraries to give the reader an idea whether the tool is worth “spending time on.”

There is a certain level of math here that, depending on your level of math training may induce anything from feelings of boredom, intimidation and frustration, to excitement. A Calculus background is helpful but not necessary, as is some knowledge of statistics. Janert does a good job of discussing the mathematical concepts that he presents, so it is possible to keep up, even if one does not fully understand the notion. The book has its limits: it is not meant to be a book on analysis of scientific data, formal statistical analysis, network analysis, text mining, or Big Data.

Continued on the next page Page 1 — Page 2

Article tags

Spread the word
Bookmark and Share
Profile image for a-jurek

Article Author: A. Jurek

A. Jurek is co-editor of the Culture section at Blogcritics. Write A. Jurek at a.jurek@blogcritics.org

Visit A. Jurek's author page

Read comments on this article, and add some feedback of your own
  • No image found

Article comments

  • 1 - anonymous

    Jan 19, 2011 at 1:12 pm

    The tools are outdated and the methodology is sometimes incorrect. Being published in 1990, the book would have been quite ok.

Add your comment, speak your mind

Personal attacks are NOT allowed.
Please read our comment policy.
Please preview your comment.

blogcritics lists for May 22, 2013

fresh articles Most recent articles site-wide

fresh comments Most recent comments site-wide

most comments Most comments in 24hrs

top writers Most prolific Blogcritics for April

top commenters Most prolific Commenters in 24 hrs