As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories.
Many products and approaches now provide data discoverability through indexing and aggregate counts, but few also provide the level of confidence needed for making strong assertions about data provenance. For that, a system needs policy to be enforced; a model for data governance that provides understanding about what is in the system and how it came to be.
With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance.
Data management should be data centric, and metadata driven.