Advanced Computing in the Age of AI | Friday, March 29, 2024

Data Unification: A New Path to Digital Transformation 

Source:shutterstock_168006722

You may be familiar with the cliché that data is the new oil and analytics is the combustion engines that will power enterprises to corporate nirvana with all manner of game-changing insights. But the problem is that for most large businesses, enterprise data is like the tar sands. It’s there and everyone believes in its huge potential, but it’s incredibly hard to get at it without enormous amounts of effort, time and cost using traditional extraction methods. And feeding unprocessed ‘tar sands’ of data into a fancy analytic combustion engine has predictably disastrous results.

But there’s a new way to transform siloed, dirty enterprise data at scale into a clean, efficient source of fuel to power digital transformation. It’s called “data unification” technology, it leverages machine learning techniques and it’s a reinvention of traditional data management capabilities – such as those found in MDM and ETL processes – to meet the requirements of the Big Data era.

Traditional data management techniques are adequate when datasets are static and relatively few. But they break down in environments of high volume and complexity. This is largely due to their top-down, rules-based approaches, which often require significant manual effort to build and maintain.

Data unification technology flips this model on its head – focusing on connecting and mastering datasets through the use of human-guided machine learning, which leverages signals in the data to determine how it should integrate. Using automation guided by human intelligence to integrate and master datasets drives substantial benefits around speed, scale and data model flexibility while ensuring the highest levels of accuracy and trust in the results. At its most fundamental level, data unification brings the promise of machine learning to the preparation of datasets at scale.

Part Human, Part Machine

Data unification fuses the worlds of automation and human expertise to capitalize on the benefits of each in preparing datasets for analysis. By using machine learning-based automation to recommend how attributes and records should be matched, organizations benefit significantly along the dimensions of speed and scale. The platform enables connection of a growing number of data sources in a more efficient manner, addressing one of the most significant needs in large, fragmented IT environments.

Coupling this with expert validation from a human familiar with the business context surrounding the data ensures accuracy, and consequently trust, in the results. Trust is critical and another significant obstacle in most data and analytic projects. If there is a perception that the data quality is poor or a significant amount of data isn’t being incorporated, consumers will lose confidence in analytic results and not act on them.

Expert validation within data unification has another key benefit: feedback from the expert is incorporated into the technology’s algorithms so as future datasets are added, more of the process can be automated.

Real World Applications

There is broad applicability of data unification for large enterprises, whether stitching together data on customers to identifying cross-sell or upsell opportunities or generating unified views of their suppliers to surface cost savings.

For instance, financial services organizations use data unification to construct a unified view of their clients, quickly gaining visibility into sales opportunities and identifying potential risks. Because human-in-the-loop automation is being used to master customer entities across all datasets, this can be accomplished in a fraction of the time and cost of traditional approaches. What may have taken multiple weeks to prepare can now be done in a day and require significantly fewer internal resources. Equally as important, because data unification is built to scale, all data sources valuable to the outcome can be included whether they are internal or external. This scalability ensures all possible insight can be surfaced, ultimately leading to better decisions and identification of significant revenue opportunity.

Manufacturing organizations utilize data unification to connect, master and classify suppliers, materials and parts to optimize procurement and sourcing operations. These applications leverage the same human-guided unification process, the only difference being the target domain and use case for the technology. In many of these instances, the IT environment is so large and fragmented that most enterprises never even tried to solve the problem. Now, armed with unified views of suppliers and parts, organizations are identifying and capitalizing on major cost savings opportunities.

Data Unification + Self-Service Data Preparation

Self-service data preparation products round out the data preparation pipeline and work in a complementary manner to data unification offerings. While data unification focuses on working with IT and the business to set up production pipelines for integrating, mastering and classifying very large, complex and fragmented enterprise data systems, most self-service data preparation tools focus on giving business analysts the capabilities to massage such unified datasets to meet their individual needs. This would include selecting and removing irrelevant records, detecting anomalies in the dataset, finding/replacing values and so on.

In certain projects where datasets are already fairly easy to join, reasonable in size and include few -- or easy to identify -- duplicative records, these self-service data preparation tools could prove to be sufficient end-to-end. However, in large projects where data sources are dirty and fragmented, data unification is often needed as a precursor to self-service data preparation if the goal is to architect a fast, scalable and trusted data preparation pipeline.

Challenging the Status Quo

For enterprises seeking to build data environments fit for the modern era, challenging the status quo is a must. New obstacles often require new ways of thinking -- as traditional approaches will only go so far. Advances in areas such as machine learning will prove to be key to solving the most difficult enterprise data challenges. An organization facing the daunting task of building a scalable data pipeline needs to consider the benefits of new technologies, like data unification and self-service data preparation, if they are to tap the latent energy buried in their data to power their digital transformation initiatives.

This article was written by Michael Collins, product development, Tamr.

EnterpriseAI