Global Data Fabric Takes on the Diversity of Data Types
Life is a journey, but getting to your data shouldn’t have to be.
Data is the lifeblood of most businesses, but many organizations aren’t taking full advantage of their data because it’s complex, globally distributed and hard to access.
From high-resolution sensors on the edge to industrial IoT devices, the growth in unstructured data is exploding. There are more data sources emitting more data than ever before. This diversity of data types causes new silos to appear for specific kinds of processing, creating islands of data not easily applied to operational use. Without a mechanism to collect, analyze and apply the results to operational systems, much of the value is lost.
Global Data Fabric
What’s needed is a modern global data fabric that allows organizations to use data regardless of its location or format.
Data fabric refers to technology that supports the processing, analysis, management and storage of disparate data. This includes data in files, tables, streams, objects, images, and even edge or sensor data, all of which can be accessed using different standard interfaces. Through the data fabric, applications and tools can access data through a number of user interfaces such as REST API, NFS, or HDFS.
This data fabric both modernizes an organization’s data management strategy and unlocks the business value to transform day-to-day operations. It should provide three capabilities (see below): global strong consistency, continuous coordinated data flows and location awareness.
There are many roadblocks to mining value from data. You likely have to move the data from one place to another, which requires the manual intervention of an IT administrator. Then numerous access points and permission requirements must be navigated to access that data. At any point, this process can break down, resulting in delays and disruptions to your business.
“Global strong consistency,” which is the consistency of data within the cluster from a time, availability and lineage point of view, ensures that your data is accurate and available regardless of where it is located globally within your organization.
For example, you might have an analyst in Cleveland who needs to run a query on an exception report generated by a robotic arm in Munich, Germany. To detect anomalies, the data from that robotic arm needs to be consistent with your data warehouse. With global strong consistency, your data is consistent between the device – in this case the robotic arm – and the data center. The data fabric ensures this data is available and consistent regardless of where it originates and where it is accessed.
Continuous Coordinated Data Flows
Continuous coordinated data flows refers to the ability to coordinate new analytic data with historical data. Let’s go back to your robotic arm. Let’s say it missed a chip placement on the production line and generated an exception report. Instead of replacing that arm at the first sign of trouble, you have the ability to compare this data with that specific robotic arm’s history.
Perhaps the entire arm doesn’t need to be replaced because this error is well within the arm’s one-in-a-million failure rate. The only way you’re going to know this is if you can access data in real time through continuous coordinated data flows between real-time analytic data at the source (the robotic arm) and historical data (the data warehouse in the data center).
With location awareness, you always know exactly where the data is and where it originated within your data fabric. Unlike data stored in a data lake, location awareness gives your data context and lineage. Knowing the data’s point of origin, what it refers to and where it falls in the hierarchy of data gives value to your data and allows you to mine insight from it.
Going back to your plant in Munich, knowing that the exception report originated from a specific sensor in your robotic arm helps you replace the right sensor in the correct robotic arm, as opposed to simply having only a general awareness that some sensor somewhere in the world is malfunctioning.
With a global data fabric, as long as you are authorized to access the data you can run a query on data anywhere in the world without having to move it and without even having to know where that data is. You don’t need to put a request into IT to gain access and to request the data to be moved offline into a data warehouse. Furthermore, your data can be in the cloud, on premises or in a data center anywhere in the world.
The data fabric and the underlying file system give you the ability to do this, providing transparency not only to the end user but to the application as well. The application running the query doesn’t need to know that the data is not in the local data center or in a robotic arm sensor. Accessing data anywhere with complete transparency to the end user and the application allows you to quickly operationalize your data. Speed, reliability and scalability are key benefits of a data fabric environment.
Moving data in 2017 is still difficult, but if the data is transparent in a data fabric environment, it just works and maybe you won’t have to move the data at all.
Bill Peterson is senior director, industry solutions, MapR Technologies.