Advanced Computing in the Age of AI | Thursday, April 25, 2024

Metadata is the New Data: The Power of Metadata Visualization 

via Shutterstock

Organizing data is a challenge. The digital universe, according to International Data Corporation (IDC), doubles every two years. Based on current projections, it’s expected to reach 44 trillion gigabytes in 2020.1 And recent updates by IDC predict it will reach 180 trillion gigabytes in 2025.

In most organizations, the amount of unstructured data has grown to unprecedented levels. In a recent Enterprise Strategy Group (ESG) study, 47 percent of respondents said unstructured data made up more than 50 percent of their capacity. In addition, 46 percent said unstructured data is growing more than 20 percent annually.2

Convenience often determines where data is placed. It’s common for an average organization to have data spread across 15 to 30 different systems, according to Forrester Research analyst Brendan Witcher.3 And he was just referring to customer data.

What does all this mean? It means files are scattered; enterprises are storing structured and unstructured data on many different platforms, file systems and storage devices. As a result, files are hard to find. Time is wasted searching, productivity suffers, and as the volume of data grows the problem gets worse.

Silos of files create a data management problem that metadata can address. Using metadata, you can turn isolated silos of files into an asset that has business value

Logical Data Tagging

Metadata is data that provides information about other data. Author, date created, file size, location, customer, associated projects, expiration date and date modified are all examples of metadata because the information describes and adds context to other data.

Metadata can be created manually or it can be generated by an application. Once the metadata is created, it is connected to a file using a processing called “tagging.” How you use metadata will depend on your business needs. But if you tag your data logically, you can scan all your files and search for tags that satisfy certain criteria.

A Unified View of Data

Applications, computational analysis, sensors, video cameras, genomic sequencers, and many other types of processes and intelligent devices all generate data. In a common technical workflow scenario, raw data is generated and then ingested from one or more sources. That data then undergoes multiple processing steps often involving several different systems. Once all processing has completed an output product is produced. As a result, pieces of a project get scattered across different storage systems.

How do you track all the files associated with a project? How do you determine the amount of storage one project consumes? When files are spread across different storage systems, those are difficult questions to answer.

If you tag files with meaningful metadata, you have the information available to answer those questions. You can scan the metadata of all your files and organize it in a way that provides insight. Then you can use the information to better manage your files and your storage.

But first you need a unified view of your data. You need visibility into all your storage from a single management view, regardless of platform or file structure differences. That way, you know you’re indexing all the files in your enterprise.

Storing Data in the Best Place

Almost half of the organizations surveyed by ESG said unstructured data is growing by more than 20 percent; if an organization’s data grows 27 percent annually, its storage footprint doubles in three years.

Organizations typically implement a storage infrastructure based on their initial needs. Too often, data is simply stored in the easiest way possible with little thought to cost and future growth. This approach can be very expensive and lead to funding problems. Nearly 9 out of 10 respondents to the ESG survey said they expect to encounter challenges funding investments needed to keep up with unstructured data growth.4

Poor storage planning leads to overspending on infrastructure. If you don’t know where your data is or if you’re not managing it efficiently, then it’s tempting to install more high-performance storage than you really need because it’s easy. In addition, these initial infrastructure designs are often strained when storage reaches petabyte levels because they were never designed to accommodate the challenges of backing up and managing such a large storage environment.

Data should be stored where it can add the most value to your organization. In a scientific research setting, for example, performance matters. High-performance computing solutions must enable data to be analyzed very quickly. Storage performance must keep up. Thus, active files should reside on high-performance disk or flash storage. When data is inactive, it should be stored on less expensive media to lower storage costs.

Multi-tier storage is designed to optimize both performance and cost. In a multi-tier configuration, total storage capacity is distributed across different media. There is high-performance disk or flash storage for active files, and the remainder of the capacity consists of lower cost storage, such as cloud, object storage, disk or tape.

Metadata plays an important role in a multi-tier environment. Files are moved between tiers based on systems metadata and user-defined policies. And as long as the files remain visible, the metadata can be searched to find files whenever they are needed, regardless of where they are stored in the infrastructure. Thus, storage costs are minimized, and productivity is improved.

Metadata Visualization

If you’ve done a good job of tagging your files, you can now organize your data using metadata visualization. This refers to scanning the metadata of files and viewing the results of that process. Data can be presented in charts or graphs for analysis and to make informed storage decisions. It lets you manage files based on logical groupings instead of just physical groupings. You can determine where big pockets of files are stored. You can see how much storage is consumed by each project. And when you have completed a project, you can find all the associated files and archive them to tape or cloud storage, freeing up more expensive storage space.

Mark Pastor is director of data intelligence solutions at Quantum.

Sources:

1 Reinsel, David; Gantz, John; Rydning, John. Data Age 2025: The Evolution of Data to Life-Critical. Don’t Focus on Big Data; Focus on the Data That’s Big. International Data Corporation White Paper, sponsored by Seagate. April 2017.

2 Survey on Unstructured Data and its Implications. Enterprise Strategy Group, 2017.

3 Nicastro, Dom. “Let's Get Personal: Content Experts Share Their Advice.” CMSWire.com. Posted March 2, 2016. Viewed October 30, 2017. https://www.cmswire.com/digital-experience/lets-get-personal-content-experts-share-their-advice/

4 Survey on Unstructured Data and its Implications. Enterprise Strategy Group, 2017.

5 Ibid.

EnterpriseAI