A Data Professionals Community

Defining the Data Science Landscape


At events, in meetings and in general conversation with people, it’s struck me that many seem to use data science, machine learning and artificial intelligence interchangeably. And while in passing that’s okay, there are distinctions between each that make them very different. Here, we look at how to define each of those three categories and why they’re different.

Data science is the craft of turning data into action.  Data is being generated and, perhaps more importantly, digitally captured at outstanding new levels.  However, abundant data only represents potential value.  It has to be mined, refined and harvested.  Data science is the process of extracting information, understanding and learning from raw data to inform decision making in a proactive and systematic fashion that can be generalized.  A key aspect of data science is the utilization of the scientific method to form and challenge hypotheses to validate conclusions about underlying patterns in data.

Practicing data science requires the combining of a diverse set of skills.  Data scientists need to be able to query and manipulate large swaths of data, so a strong computer sciencebackground is a must.  Additionally, familiarity with mathematics and statistics help form a strong understanding of the algorithms commonly deployed and tuned.  Combining a lot of computing power and sophisticated algorithms is called data mining.  However, a major hazard of this approach is the potential to mistake noise for signal.  Domain expertise is a helpful component in the verification of causal and logical relationships in models and conclusions.

In general, a data scientist needs to know more about statistics than an average programmer, more programming than an average statistician, and be able to apply both skills to solve business problems.

The overall objective of data science may seem straightforward, but implementation is a very complex process and involves a number of steps before the value of a data science product can be observed. Here’s what that looks like:

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. DS team evaluation
  6. Stakeholder evaluation
  7. Deployment

In the modeling stage, a data scientist will look to apply statistical learning techniques (otherwise known as machine learning techniques or algorithms) to tease out details in the underlying raw data.  Machine learning involves the utilization of statistical computing to understand tendencies, patterns, characteristics, attributes and structure in the underlying data, so as to inform decisions in some future state on new observations.  Rather than hand-coding software with specific instructions and custom rules, machine learning algorithms are “trained” on large pools of data and “induce” how to perform a specified task…

Source Continue Reading

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More