DataScience.US
A Data Professionals Community

# 3 Components that Underlie Predictive Analytics

## The power of Machine Learning Technique

19

Let’s start with understanding “Predictive Analytics”. This term originated as an evolution from “Descriptive Analytics”, or just plain “Analytics”.  Descriptive analytics refers to the process of distilling large amounts of data into summary information that is more easily consumed by humans. Example techniques used in Descriptive Analytics include counts and averages to answer a question such as “What were my average sales by region last quarter?” By its nature, descriptive analytics is a backward looking view at “what happened.”

As a natural progression, Predictive Analytics attempts to answer the question “what might happen in the future?”  In common usage, Predictive Analytics typically applies more advanced classical statistical techniques such as linear regression to answer a question such as “If I increase my advertising spending by 10%, how much will my sales increase next quarter?”

The three basic components that underlie predictive analytics:

1. The Data: A predictive model is only as good as the historical data that underlies it.  Google’s Chief Economist Hal Varian was famous for saying that Google doesn’t have better models; it just has more data.

2. The Statistics: This is the set of mathematical techniques, ranging from basic to advanced that are applied to the data to derive inference, meaning, and insight.  The most common statistical technique used in predictive analytics is linear regression, which the author nicely describes as the iterative process of selecting and testing the impact of variables on the outcome.

3. The Assumptions: These are the things that are presumed to be true, with the most common being that the future will continue to be like the past.

The use of Machine Learning for Predictive Analysis

Predictive Analytics is a use and Machine Learning is a technique

With this framework of understanding predictive analytics, we can now contemplate why machine learning holds such potential power.  Specifically it is the difference between classical statistics and machine learning techniques. The fundamental difference is that the former relies on a human expert to formulate and test the relationship between cause and effect, i.e. the hypothesis that advertising is a driver of sales.

[bctt tweet=”Machine learning is today’s discontinuity”, Jerry Yang” username=”TeamBisilo”]

Machine learning flips this process on its head; it starts with the outcome (i.e. how much were my sales) and teaches a computer to automatically uncover the factors that are driving this particular outcome.  These relationships may be incredibly complex, including hundreds of possible causes, interactions, and non-linear responses.  If done properly, the result is a far more accurate predictive model that has the ability to automatically adjust and improve over time.

Machine learning algorithms can broadly be segmented into supervised and unsupervised versions.

Supervised machine learning algorithms can be used for predictive analytics. In supervised machine learning, we train the algorithm with labelled training set, e.g. given the weather conditions of 100 days and wanting to model which days would rain, we then choose an algorithm and feed it data about what the conditions were like on days where it rained, and on days where it did not rain. This trained model can then be applied on new data to predict if it will rain.

Predictive analytics generates great value from your data

Lately the use of predictive analysis techniques has tripled. It is no longer about trying to decide whether to choose a predictive strategy or not, but the question is rather whether you can take the risk of not doing so, considering that: