A conversation about metadata management featuring Forrester Analyst Michele Goetz
I recently had an opportunity to speak with Forrester Principal Analyst Michele Goetz following a research study on the subject of metadata management conducted by Forrester Consulting and commissioned by IBM. Michele’s research covers artificial intelligence technologies and consultancies, semantic technology, data management strategy, data governance and data integration, and includes within its scope the topic of metadata management.
Metadata management is a topic close to my heart. As the product marketing lead for IBM Storage for AI and Big Data, I’m keenly aware of and interested in the crucial role that metadata plays in both the optimization of storage for unstructured data, and also in the curation and preparation of data for large-scale analytics and data science. But in conversations with clients, I sometimes have a sense that not everyone shares the same level of awareness of how much metadata management could contribute to their success. So, I explored this and other questions with Michele. What follows is a transcript of our conversation.
DH: Michele, from your vantage point, would you say that most businesses have a good understanding of why metadata management is important and, more importantly, what it can do for their business?
MG: Metadata management is still an abstract concept for most enterprises. The old explanation that “metadata is data about data” has obscured its importance. When business stakeholders work on ways to label data, the tags, classifications, and relationships are all metadata but they don’t call it that. It’s just data. For technology members, metadata represents where data is located, its physical format, and the logical aspects of data in tables and files. It is only when data is to be governed, secured, and placed in context for analytics, decisions, and business actions does metadata move from an abstract concept to strategic artifact. As companies prioritized the establishment of data lakes and moving data to the cloud, investments in data catalogs that hold metadata were the first investments after security. That metadata ensured improved management, lower management cost, faster activation, and support for data analysts to find and consume data independently. In 2019, businesses are starting to take metadata strategies a step further as the ‘data about data’ is what intelligent and automated systems need to adapt and service data for personalization in customer experience or autonomous decisioning in process automation use cases. But we learned from the IBM-commissioned study that only 8% of companies are fully capitalizing on their existing data in storage. Metadata is the key to powering digital with data.
DH: For those who may not know, will you explain what metadata is and, perhaps more importantly, why it matters?
MG: Metadata is the data that tells you what you need to know about information. As a storage analyst, metadata describes the physical traits about what the content is, how large it is, where it is stored in systems. Data management teams assign logical metadata to information that helps organize and find data in repositories. They also add metadata to assign security, privacy, legal hold and business labels to make data understood and useful. Without metadata, information will be difficult to find, interpret, and ultimately left cold on the shelf.
DH: I’ve heard the term ‘semantic’ metadata. Can you elaborate on exactly what is meant by that and whether it’s different from just plain metadata?
MG: Semantic data is the language of the business. Business stakeholders impart their language onto the data in order to make sense of what they have and how it is used to support decisions and actions. Semantic metadata can be labels that define a customer, business, or industry. It can be data that shows a parent child relationship between a headquarters office and branch location. Semantic metadata can be classifications and tags to communicate how and when to use data, such as including a customer in an email communication about an upcoming sale. Semantic data is what activates information for business value.
DH: Who and what roles or functions in an organization can benefit most from cataloging, managing, enriching and using metadata, and how might they benefit?
MG: Data catalogs are the hubs that capture, define and maintain metadata. Think of these environments as the marketplace for data. Data catalogs show what data is available, insight about what it is, and what policies govern quality, access, use, and lifecycle. Data catalogs tell where information came from, the provenance and lineage, to know if the information is trustworthy, how it might have changed, and who owns the data if questions arise. The metadata data catalogs are the instructions storage analysts and data management teams use to administer, troubleshoot, migrate, deliver, and steward data.
DH: What are some ways that organizations can exploit metadata for their benefit?
MG: Best practices to capture, create, and manage metadata ensure data is ready for use and compliant with corporate and regulatory policies. Upstream back office, front office and edge systems and applications all require metadata to generate insights from analytics and automate decisions and actions. When metadata is not available, delays in information access and delivery can cause firms to miss deadlines or specifications inside contracts leading to delays, and additional cost or fines. Privacy controls cannot be met when information containing PII (personal identifiable information) is not clearly identified. Metadata allows analysts and data scientists to quickly gather information and create insights for strategic decisions or machine learning models that automate personalized offers or predict when parts need maintenance. Metadata also makes a difference for storage analysis. Metadata helps to understand and anticipate actions to take with data based on storage size, type, lifecycle, policy and utilization. These dimensions indicate how to optimize storage and infrastructure to support information demands for security, delivery, quality, and lifecycle management.
DH: If an organization wants to start down the path toward strategic metadata management, where and how should they begin?
MG: Everything begins with analytics. You can’t manage what you don’t understand. Metadata management begins with data discovery and profiling. Granular metadata analysis helps to uncover metadata that is already available about the content and generates statistics on the quality. This same granular analysis can surface up gaps or suggest where additional metadata can be added for better retrieval, insight consumption, management and governance. Data catalogs can then auto link what has been learned from the metadata and link back to data management and governance policies. Storage analysts and information stakeholders are then able to add their own knowledge about the content to label and classify information. Through these activities, Chief Data Officers and data management leaders can strategically manage data not only toward IT SLAs and cost efficiency, but they are able to open data up to bigger business opportunities where data value leads to strategic business outcomes.
Continue the conversation with us…
If you find this topic interesting, we invite you to continue the conversation with us on Setember 19 when I’ll host Michele in a webcast and Q&A event titled Heating Up Cold Data with Metadata. In this webcast, Michele will present research commissioned by IBM and conducted by Forrester Consulting that considers some of these questions and more. A live Q&A session will follow, so bring your questions, share your own experiences, or just listen in to learn more about how your business can increase efficiency, improve data governance, and increase competitive advantage with insights gained through effective metadata management.
Register here to attend.
The post A conversation about metadata management featuring Forrester Analyst Michele Goetz appeared first on IBM IT Infrastructure Blog.