Copyright © 2018 DataScience.US All Rights Reserved.
Demystifying Data Science
Artificial intelligence is just one of many different tools in the predictive analytics kit bag of a data scientist.
[Opening Scene]: Billy Dean is pacing the office. He’s struggling to keep his delivery trucks at full capacity and on the road. Random breakdowns, unexpected employee absences, and unscheduled truck maintenance are impacting bookings, revenues and ultimately customer satisfaction. He keeps hearing from his business customers how they are leveraging data science to improve their business operations. Billy Dean starts to wonder if data science can help him. As he contemplates what data science can do for him, he slowly drifts off to sleep, and visions of Data Science starts dancing in his head…
[Poof! Suddenly Wizard Wei appears]: Hi, I’m your data science wizard to help alleviate your data science concerns. I don’t understand why folks try to make the data science discussion complicated. Let’s start simple with a simple definition of data science:
Data science is about identifying those variables and metrics that might be better predictors of performance
The key to a successful analytical model is having a robust set of variables against which to test for their predictive capabilities. And the key to having a robust set of variables from which to test is to get the business users engaged early in the process.
[A confused Billy Dean]: Okay, but I’m still confused. I mean, how does this really apply to my business?
[A patient Wizard Wei]: Well, let’s say that you are trying to predict which of your routes are likely to have under-capacity loads so that you can combine loads. In order to identify those variables that might be better predictors of under-capacity routes, you might ask your business users:
What data might you want to have in order to predict under-capacity routes?
The business users are likely to come up with a wide variety of variables, including:
|Customer name||Ship to location||Customer industry|
|Building permits||Customer tenure||Change in customer size|
|Customer stock price||Customer D&B rating||Types of products hauled|
|Time of year||Seasonality/Holidays||Day of week|
|Distance from distribution center||Open headcount on Indeed.com||Tenure of logistics manager|
The Data Science team will then gather these variables, perform some data transformations and enrichment, and then look for variables and combinations of variables that yield the best predictive results regarding under-capacity routes (see Figure 1).