Copyright © 2017 DataScience.US All Rights Reserved.
Why Big Data and Data Scientists Are Overrated
What does it take to get value out of data?
Many organizations assume that you need a big collection of data and a highly skilled data scientist to spin all those 1s and 0s into dollar signs. In reality, companies need neither of those things to be successful with data.
One of the biggest mistakes that organizations can make with their data analytics projects is to assume they need a data scientist at the very beginning. According to Daniel Mintz, chief data evangelist with Looker, organizations are much better off starting lower on the data analytics food chain and working their way up as they gain proficiency.
“I’ve seen cases where people hired a data scientist way before they’re ready,” Mintz tells Datanami. “They don’t actually have any data, and even if they do, it’s dirty and dispersed across a whole bunch of places. The data scientist who doesn’t necessarily understand their business arrives and says ‘Where’s the nicely curated data set that you want me to use to solve problems?’ And they say, ‘Oh we didn’t’ know that was a prerequisite.’”
The fact is, data scientists spend about three-quarters of their time doing data janitorial work – collecting, transforming, and cleaning data – rather than building the complex predictive models that they were actually hired for. That equals frustration for data scientists who had high hopes of making an impact, and sour grapes for the people who hired them.
Organizations should start with the basics, and work up from there. Instead of being lured by the “shiny object” syndrome and thinking you need a big Hadoop data lake or neural networks to solve a problem, seek the simplest answer.
“People make a mistake if they jump right to the most sophisticated tool, because they’re wasting a lot of time,” Mintz says. “The reality is a lot of problems are quite tractable with a simple regression. And some problems don’t even need that. You can just look at the data and see what’s happening.”
Mintz’s personnel advice? Hire data generalists who can do the time-consuming data legwork that’s needs to be done before more highly skilled (and highly paid) data scientists come in to do their highly specialized thing.
“The really key skill is having somebody who can take what is fundamentally a business question and translate that into a data question,” says Mintz, who previously at MoveOn.org and other data-intensive operations. “That’s the key skill. When you’re not big enough to have specialists, the business people, who aren’t data people, will know what the right business questions are.”
Mintz recommends pairing a SQL-loving analyst with an ETL-loving engineer to start helping the business prepare themselves to answer questions with data. As they document their data stores, define organization-specific metrics, and create workflows that transform and combine data in reliable and useful ways, they will start to see how the super powers of the real data scientists could best be used.