Data Science in a Box: Tools Attack Critical Skills Shortage
Begging forgiveness for the expression of antiquated gender attitudes, we share a passage from Pride & Prejudice (1813):
“….no one can be really esteemed accomplished who does not greatly surpass what is usually met with. A woman must have a thorough knowledge of music, singing, drawing, dancing, and the modern languages, to deserve the word….”
“All this she must possess,” added Darcy, “and to all this she must yet add something more substantial, in the improvement of her mind by extensive reading.”
“I am no longer surprised at your knowing only six accomplished women,” (said Elizabeth). “I rather wonder now at your knowing any.”
Something like this can be said of the data scientist, the IT unicorn whose accomplishments should include “a graduate degree in computer science and expertise in mathematics, statistics, computer programming and business knowledge.”* One wonders at hiring managers knowing, or finding, any.
Instrumental to advanced analytics and machine learning, data scientists must have command of a kitchen sink of tasks: collecting, preparing and organizing large data sets in a variety of formats; developing and testing algorithms; building and implementing machine learning solutions; conducting data pattern analysis; explaining results to line managers, senior management and customers.
Oh heck ya, we can do that.
The kitchen sink list of data science skills and knowledge combined with scarcity has driven stratospheric costs: data scientists out of college are supplicated with offers in the high five figure; for experienced ones, pay reaches into the mid-six figures. A McKinsey report projects a U.S. data scientist shortfall of about 250,000 by 2024.
Meanwhile, getting value out of data is getting harder all the time. Panoply, a smart cloud data warehouse vendor, surveyed attendees at Amazon’s recent re:Invent conference and found that 66 percent said that with the growing number of apps, databases and data sources to contend with, data is too difficult to manage; data needs are outpacing their teams' abilities to keep up; that disparate apps (Google Analytics, Facebook Ads, etc.) and services don't work together well enough, and so on.
But technology, like nature, hates a void. New tools for automated machine learning, pre-trained AI models, data prep and other data science tasks comprise a growing software category. Deloitte Insights recently issued a report, “Democratizing Data Scient to Bridge the Talent Gap,” (by David Schatsky, Rameeta Chauhan and Craig Muraskin) painting an optimistic picture that begins by citing a study issued a year ago by Gartner predicting that by 2020 more than 40 percent of data science tasks will be automated. This may well be an achievable goal, considering that data mining company CrowdFlower has reported that 80 percent of what data scientists do is tedious, repetitive and ripe for some degree of automation.
“Without data science, companies can't get full value from data, and there aren't enough data scientists to go around,” write the Deloitte authors. “But automation and training are giving companies access to data science without having to wage a war for talent.”
A key point, Schatsky told us, is that while there’s a shortage of hands to get data science work done, some companies overestimate the need for data scientists.
“We’ve seen situations with clients who think we just have to get data scientists in here and then they’ll be on their way to solving some business problem,” Schatsky said, “but generally these problems are best solved with a cross functional team of people who understand the business, the business process, what the goals of the business are, plus people who understand the data and know what to do with it.”
A cross-functional approach, combined with automation, can mean “there may not be as much of a shortage as you think when you reflect on the tools that are making data science more efficient and effective and the educational opportunities aiming to impart sufficient data science skills to increase the supply within an organization…
“Data science is unquestionably important, but how you accomplish it we think is shifting and organizations should shift, they should take a broad approach to getting access to the power of data science through the use of tools to significantly increase their productivity, to develop new data science capabilities in your staff…, it’s a multipronged approach that we think makes sense for organizations that are trying to get a grip on their data to get value out of it.”
The Deloitte analysts report that, anecdotally, early adopters of data science automation tools are realizing time and cost savings, citing an article in ZDNet in which Virgin Australia, using Boston-based DataRobot’s automated machine learning platform, has “cut down the time it takes to build predictive models by up to 90 percent, while boosting accuracy by up to 15 percent.”
Loyalty Lab, a Netherlands company that develops customer loyalty strategies, adopted an automated PredicSis.ai machine learning tool, available in the AWS Marketplace, and combined it with Amazon Redshift and Simple Storage Service technologies to develop ML capabilities with no AI specialists on staff, according to AWS.
Industry analyst firm IDC conducted a study on usage of Salesforce’s low code development Lightning Platform, finding that organizations had a 57 percent faster IT development lifecycle and, over five years of use, a 545 percent ROI.
Deloitte underscored five areas in which new tools put data science capabilities “in the hands of more professionals and potentially alleviating a crippling talent shortage.”
- Automated machine learning is targeted at tasks that include data preparation, feature engineering and selection, and algorithm selection and evaluation, according to Deloitte. This publication has written about data prep vendor Trifacta (see “Data Prep: Easing Data Scientists’ ‘Janitorial Work’”), there also are tools announced by Google (AutoML Vision), an integrated data science and ML platform from IBM and an Azure service from Microsoft to automatically build AI models.
- Low-code app development platforms, designed for noncoders by combining GUIs, drag-and-drop modules and other software-defined, user-friendly elements, can speed up by up to 10X AI application development. “For example, using a no-code platform, salespeople can build a machine learning-based tool themselves to provide product recommendations to customers based on cross-sell opportunities,” Deloitte reported. The authors cite estimates by industry watcher Forrester that the low-code development market is growing annually by 50 percent and that the market in 2017 was worth about $4 billion globally.
- Pre-trained AI models. AI software vendors and startups have introduced pre-trained AI models, “effectively packaging machine learning expertise and turning it into products,” Deloitte reported. Available from AWS, Google Cloud, Azure and other cloud platforms and delivered via cloud-based APIs, these models are generally applied to such use cases as image, video, audio, or text analysis, including sentiment analysis, sales opportunity workflow automation, customer service, automated equipment inspection, and online, interactive advertising. “They let application developers solve these problems without even solving them, they’re pre-solved, a new generation of APIs created through data science to do a discrete task,” Schatsky said.
- Self-service data analytics. Designed to “empower business users to perform complex data analysis and get quick access to customized insights” without help from data scientists and analytics specialists, these are tools that enhance data-based insights. Some self-service analytics tools automate development of machine learning models and include “natural language query and search, visual data discovery, and natural language generation (that) help users automatically find, visualize and narrate data findings like correlations, exceptions, clusters, links and predictions.”
- Accelerated learning. Not an automation tool, these are training courses and boot camps where IT professionals with math and coding skills go to delve into the details of data science and AI and last from days to months.
While installation of these tools generally can be handled by IT managers capable of installed enterprise software, Schatsky said, their implementation isn’t necessarily as straightforward.
“Since a lot of the technological advancements in this area have happened recently, enterprises may encounter resistance to using these solutions,” the authors noted. “Business users may not be ready to trust them, preferring to continue relying on intuition and traditional decision-making processes. Technical experts, by contrast, may resist changing their workstyle and automating tasks they think of as requiring expert craftsmanship.”
Deployment can be another challenge.
“Without proper onboarding and training, users provided access to data science automation and self-service tools may fail to derive relevant insights or misinterpret or misapply the results in decision-making. Wide adoption of these tools will necessitate instituting governance procedures that run the risk of becoming bottlenecks. Inadequate data controls and governance practices in enterprises may lead to creation of information silos, bad analysis, and lack of accountability. Thus, companies need to prepare to address these challenges before moving forward with data science democratization.”
* “This Is America’s Hottest Job,” Bloomberg, May 18, 2018