Copyright © 2018 DataScience.US All Rights Reserved.
Data Science & Machine Learning Platforms for the Enterprise
A resilient Data Science Platform is a necessity to every centralized data science team within a large corporation. It helps them centralize, reuse, and productionize their models at peta scale. We’ve built Algorithmia Enterprise for that purpose.
You’ve built that R/Python/Java model. It works well. Now what?
“It started with your CEO hearing about machine learning and how data is the new oil. Someone in the data warehouse team just submitted their budget for an 1PB Teradata system, and the the CIO heard that FB is using commodity storage with Hadoop, and it’s super cheap. A perfect storm is unleashed and now you have a mandate to build a data-first innovation team. You hire a group of data scientists, and everyone is excited and start coming to you for some of that digital magic to Googlify their business. Your data scientists don’t have any infrastructure and spend all their time building dashboards for the execs, but the return on investment is negative and everyone blames you for not pouring enough unicorn blood over their P&L.” – Vish Nandlall (source)
Sharing, reusing, and running models at peta-scale is not part of the data scientist’s workflow. This inefficiency is amplified in a corporate environment where data scientists need to coordinate every move with IT, continuous deployment is a mess (if not impossible), reusability is low, and the pain snowballs as different corners of the company start to “Googlify their business”.
A Data Science & Machine Learning Platform is meant to bridge that need. It serves as the foundation layer on top of which three internal stakeholders collaborate: product data scientists, central data scientists, and IT infrastructure.
Fig. 1: A data science platform serves three stakeholders: product, central, and infrastructure. It is a necessity for large corporations with complex and growing reliance on machine learning.
In this post we’ll cover:
- Who needs a Data Science & Machine Learning (DS & ML) Platform?
- What is a Data Science & Machine Learning Platform?
- How to differentiate platforms?
- Examples of platforms
Do you need a Data Science Platform?
It’s not for everyone. Small teams with one or two use cases are better off improvising their own solutions around sharing and scaling (or use privately hosted solutions). If you’re a central team with many internal customers, you’re likely suffering from one or more of the following symptoms:
Symptom #1 you’re splitting code bases
Your data scientist creates a model (let’s say in R or Python) and wants to plug it into production to be used as part of a web or mobile app. Your backend engineers, who built their infrastructure with Java or .NET, end up re-writing that model from scratch in their technology stack of choice. Now you have two code bases to debug and synchronize. This inefficiency multiplies as you build more models over time.