Data Science for Business is directed towards basic to advanced practitioners of this relatively developing field. Going through this book should be helpful also for potential beginners facing job interviews across the gamut of data science problems.
Data science should logically seem to form the building blocks of data-driven businesses. However, it is also proven to be useful to capture customer buying patterns in departmental chains and manage and direct the inventory based on intelligent data analysis.
An analogy can be drawn to the complex challenges of weather forecasting, which give the benefits in terms of timely warnings for major and minor weather events, as well as helping the navigation on the waterways, airways or roads. This has the expected benefits of safe and secure travel across long distances, growing world economy, avoiding or managing natural onslaughts and making life more predictable for the common man.
Data science incorporates varying elements and builds on techniques and theories from across fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.
The book explains how statistical techniques which are used for data processing and data mining improve our understanding of the data. Based on the types of models being applied on the data, there may also be the need for data preparation and data cleaning. The book also discusses the various models such as predictive modeling from correlation to supervised segmentation. There is also the option in some scenarios to develop a model out of existing data set using induction. The procedure that creates a model from the data is called the induction algorithm or the learner.
One method explained in the book for learning a predictive model from a dataset is to start by specifying the structure of the model with certain numeric parameters left unspecified. Then the data mining will calculate the best parameter values given a particular set of training data. The data miner specifies the form of the model and the attributes; the goal of the data mining is to tune the parameters so that the model fits the data as well as possible. This general approach is called parameter learning or parametric modeling.
Some of the other terminologies discussed in this book are probability estimation, nonlinear functions, support vector machines and neural networks. One of the interesting pitfalls in data science is called over fitting. Finding chance occurrences in the data that look like interesting patterns but which do not generalize is called over fitting the data. Generally there will be more over fitting as one allows the model to be more complex.
The concepts of nearest neighbor reasoning which make use of distance based on description, clustering, link prediction, social recommendations are also discussed in detail.
The skill-sets and competencies that data scientists employ vary widely. Data scientists are an integral part of competitive intelligence and business analytics, a newly emerging field that encompasses a number of activities, such as data mining and analysis, that can help businesses gain a competitive edge.
A major goal of data science is to make it easier for the end-users to find and interpret the data with greater ease to make informed decisions and take the best possible chance at success.
To summarize this review, the book is highly recommended for students through to advanced practitioners due to the lucidity of the explanations and the breadth of the fundamental topics related to data science.