A Data Professionals Community

An Introduction to Key Data Science Concepts

Here at Dataiku, we frequently stress the importance of collaboration in building a successful data team.


In short, successful data science and analytics are just as much about creativity as they are about crunching numbers, and creativity flourishes in a collaborative environment. One key to a collaborative environment is having a shared set of terms and concepts. Even if you aren’t working in data science per se, it’s still useful to familiarize yourself with these concepts — if you’re not already incorporating predictive analytics into your everyday work, you probably will be doing so soon!

An Introduction to Key Data Science Concepts - DataScienceUS

The terms we’ve chosen to define here are commonly used in machine learning, and they’re essential to learning the basics of data science. (And many thanks to our own Ulysse Mizrahi, who helped us in compiling the list and writing the definitions!) Whether you’re working on a project that involves machine learning, or you’re learning about data science, or even if you’re just curious about what’s going on in this part of the data world, we hope you’ll find these definitions clear and helpful.

And when you’re finished, check out our introduction to machine learning algorithms.

Model: a mathematical representation of a real world process; a predictive model forecasts a future outcome based on past behaviors.

Algorithm: a set of rules used to make a calculation or solve a problem.

Training: the process of creating a model from the training data. The data is fed into the training algorithm, which learns a representation for the problem and produces a model. Also called “learning.”

Regression: a prediction method whose output is a real number, that is, a value that represents a quantity along a line. Example: predicting the temperature of an engine or the revenue of a company.

Classification: a prediction method that assigns each data point to a predefined category, e.g., a type of operating system.

Target: in statistics, it is called the dependent variable; it is the output of the model or the variable you wish to predict….

Source Continue Reading

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More