Your Source for Data Science

If Machine Learning is the question, open source is the answer. Right?

Why Google's gift of TensorFlow is not what it seems


Machine learning (ML) and artificial intelligence (AI) are extraordinarily hard to pull off in the real world, so of course the solution must be open source. From Google’s TensorFlow to Microsoft’s Cognitive Toolkit, the world is awash in open source ML/AI code… none of which seems to be solving the gaping void between AI hype and production deployment reality. By Gartner’s estimates a mere 15 per cent of organisations actually get into production with ML/AI.

A big reason for this gap is talent. Or, rather, a lack thereof. There’s a chance, however, that an influx of open-source code into the ML universe could improve things. How so? By lowering barriers to entry to experiment on and become proficient with high-quality ML software. Perhaps not surprisingly, the cloud giants that stand to gain from an influx of data-heavy ML applications are the same ones open sourcing the ML code in the first place.

Mind the gap

Despite the incessant hype over ML’s promise to change everything forever, the reality is that ML has hardly managed to get out of neutral, much less first gear. As I’ve detailed before, the biggest barrier to ML success is a distinct lack of qualified engineers. Or as Gartner analyst Merv Adrian put it to me: “[I]t’s mostly about skills. Missing skills.”

What skills are missing? I’m glad you asked. There are a number of lists of must-have attributes for ML engineers, including this one: “[B]e aware of the relative advantages and disadvantages of different approaches, and the numerous gotchas that can trip you,” or this: “Be comfortable with failure,” not to mention a slew of algorithms.

Summarizing the brutal difficulty of sourcing the complete ML package, O’Reilly Media chief data scientist Ben Lorica and vice president Mike Loukides tells us all that’s needed is to find unicorns and a pot of gold at the end of the rainbow: “They frequently have doctorates in the sciences, with a lot of practical experience working with data at scale. They are almost always strong programmers, not just specialists in R or some other statistical package.

“They understand data ingestion, data cleaning, prototyping, bringing prototypes to production, product design, setting up and managing data infrastructure, and much more. In practice, they turn out to be the archetypal Silicon ‘Valley unicorns’: rare and very hard to hire.”

With such a deficit of expertise, it’s not clear how open source would help.

In the early days of Linux, for example, the director of IBM’s Linux Technology Center told me that for open source to be successful, you had to have a sufficient body of developers with aptitude and interest in a given area. Every developer needs an operating system, for example, so there tends to be a large body of developers with interest and aptitude in contributing to something like Linux. Ditto databases, app servers (remember them?), and so on.

More recently, Apcera chief executive (and Cloud Foundry architect) Derek Collison told me: “Open source is a natural progression for ecosystems where there’s a lot of innovation and breakthroughs. The market eventually becomes democratized and open source alternatives emerge.”

Where open source doesn’t work, he declares, is when you go open source: “From the start in an ecosystem that doesn’t even know what it means.”

Like, say, ML, where your odds of finding a qualified engineer are about the same as Lionel Messi signing for Millwall. It’s not going to happen.

After all, these open source ML frameworks come from the rarified air of Google, Facebook and other unicorn-esque companies. It’s not clear that anyone else would know enough to be able to contribute to projects like TensorFlow, and not many more know how to use the software.

Which, ironically, may well be the point….

Source Continue Reading