Toward Broader AI Adoption: Deloitte’s ‘5 Vectors of Progress’
For all the talk about AI it's somewhat surprising that only about 10 percent of companies have invested in machine learning. While the talk underscores AI’s great potential, the low adoption rate stresses its daunting complexity and costs.
Certainly, AI (business intelligence, machine learning, deep learning) is candy to business managers, who salivate at the thought of applying it to their most vexing, time consuming and high ROI challenges. Although we hear that only hyperscalers, web companies and well-heeled oil/financial services firms have the deep compute, data and staff resources to use AI today, the good news is that enormous brainpower and money are directed at moving AI into the mainstream market.
In a new paper released today by Deloitte LLP, the consulting and audit firm, progress is underway addressing the key barriers between AI and broader adoption. The “five vectors of progress,” according David Schatsky (Deloitte managing director) and Rameeta Chauhan (senior analyst) are:
- Automate tasks that occupy 80 percent of data scientists’ time
- Reduce high data volumes required for training via “transfer learning”
- Accelerate training through specialized chips (GPUs, etc.)
- Open the “black box” so users can see how machines arrive at decisions
- Embed machine learning into mobile devices
“There’s rapid improvement in (machine learning) tools and the techniques,” Schatsky told EnterpriseTech, “this is going from something for specialists only to rapidly entering the toolkit of data teams and analytics teams and software development teams more broadly. Not next year, but the barriers to entry are dropping as the tools get better.”
Given the chronic shortage of data scientists, the high salaries they command and the criticality of their skill set, the most important of the five vectors, Schatsky said, is automating data preparation and other relatively low-level tasks that consume too much of their time.
“I think an area that will have the biggest near-term impact is probably data science automation because that’s so broadly applies to so many classes of problems,” said Schatsky. “Data scientists spend so much time on things that can be automated that that’s going to make a big impact… Any company doing machine learning or data science will want to get on board, because if they don’t they’re going to be wasting precious human resources.”
Data science, he said, is a mix of art and science — and digital grunt work, and an increasing portion of the grunt work can be automated. This includes, Schatsky and Chauhan write, data wrangling: pre-processing and normalizing data, filling in missing values, deciding whether to interpret the data in a column as a number or a date; exploratory data analysis: understanding the broad characteristics of the data to formulate hypotheses about it; feature engineering and selection: selecting variables in the data that are most likely correlated with what the model is supposed to predict; and algorithm selection: testing algorithms (hundreds, even thousands of them) to find those that produce the most accurate results.
“Some people, when they hear about automating data science, they think: ‘Great, then you’re not going to need data scientists.’ But it’s really about multiplying the data scientists that you have, making them more efficient,” Schatsky said. “Anything that’s non-trivial is what you need a (data scientist) to do. You don’t need them spending 80 percent of their time on things that can be automated.”
In their paper, Schatsky and Chauhan cite Airbus , in building a customer value model, using an automation platform for testing multiple algorithms and design approaches, something the data scientists didn’t have the time to do themselves. “A growing number of tools and techniques for data science automation, some offered by established companies and others by venture-backed start-ups, can help reduce the time required to execute a machine learning proof of concept from months to days. And automating data science means augmenting data scientists’ productivity, so even in the face of severe talent shortages, enterprises that employ data science automation technologies should be able to significantly expand their machine learning activities.”
Training Time & Data
Problems of system training are under attack from several directions, from reducing the data required to the development of specialized chips, such as GPUs, FPGAs, ASICs and Google’s Tensor Processing Unit, which combines CPUs and GPUs. The authors cite accounts of a Microsoft research team that used GPUs only to train in one year a speech recognition system that the researchers say would have taken five years using CPUs.
Another AI-enabler: availability of accelerated processors and frameworks on public clouds.
“With every major cloud provider — including IBM, Microsoft, Google and Amazon Web Services — offering GPU cloud computing, accelerated training will become available to data science teams in any organization, making it possible to increase their productivity and multiplying the number of applications enterprises choose to undertake.”
System training, which can require millions of data elements along with the cost and time involved in data acquisition and labeling, is seeing the development of a number of promising techniques (including capsule networks, which we wrote about last week). The authors cite “synthetic data,” which is generated algorithmically to take on the characteristics of the real data. Using 80 percent synthesized data, a project team at Deloitte used a tool that helped build a model with only a fifth of the training data previously required.
The authors also cite “transfer learning” (an aspect of capsule networks), in which a machine learning model is pre-trained to learn a new dataset in a similar domain, such as language translation or image recognition.
Opening up the “black box” of AI is a critical issue not only for users but also for the general public, alarmed by dark visions of machines run amok. In addition, some regulated industries, such as the U.S. banking industry, are mandated by Federal Reserve guidelines to interpret, and be able to explain, their systems’ decisions. The authors cite researchers at MIT who have developed neural network training methods that generate not only accurate predictions but also their rationales.
“Training data, interpretability and other problems are being tackled from a number of different directions,” said Schatsky, “so…you do get confidence that there are multiple strategies that companies are taking that are making significant progress along all these fronts.”
Companies looking to take the first steps into AI could view things from a supply side and demand side paradigm, Schatsky suggested.
“On the supply side, understand the tools and technologies available to use and to play with, and these are getting more accessible – between the hosted machine learning toolkits that all the cloud providers offer, to various open source frameworks, it’s getting easier to get started without making big investments.”
The demand side, he said, are workloads likely to be improved with machine learning techniques.
“I’d categorize them as the problems you’d like to tackle, where do you have data, where do you have decisions where incremental improvement could mean real business value,” he said. “The intersection of those two is a good area to start working on some pilots… It’s a good time to start rolling up your sleeves and start piloting things because tools are going to rise up to meet your needs.”